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cgattcagcccgttttattttaaaaagttatggttagatgcatgggaagaaacgcatgaa 
cgtcgattggttgatgaaatttatcgtgttccaaccgaagatactgatcaagcaaaagca 
aaagtacaacttgcgttttgtattgatgtacgatccgaaccgtttagaagacatttagaa 
agtgaagggccttttgaaacaatagggattgctgggttctttggtctgccaattcaaaaa 
5 gaagtacttgatgaacaatttgcacatccatctttaccagtgatggtagaacctgcatat 
cgtattaaagaatatgctgatcaacatgaaatgaaaatttataatcaacaacaaca.taca 
cttacatctatgttttacaactttaaattaatgaaaaacaacgtgttgccaagtttgctt 
ttaccagaattaagtggtccttttttaagtattgcgactatagctaacacaattttccct 
aaaaaagcaaaacgtattgttcatcgattctcacaaaaatggctacgtaaaccaacaggt 
10 aaattaactattcagcgtgagcaagatgcgtattcaaaactaccaatcggctttacttta 
gaagaac/.gattcaattttccaaaaaagcattacaattaatggacttaacagatg.itttt 
gcaccacttattgttctatgcggacacggtagtgaatcacataataatccctatcatgct 
tcattagagtgtggggcttgcggcggtgcctcgagtgggttcaatgcgaaattattagca 
gtaatgtgtaatcaagaaaatgttagacgtggattattgatggaaggaatcgacattcca 
15 agacatacagtgtttatagctgctgaacatcaaacgtcagttgatgagttagagtatatt 
tatgttccacctttaactacagaagctcaaaatgcctttgacgaacttaagcatgtgatg 
ccaaaagtatgttataaagccaatttagaacgtttggcatcgttgccaaatataaataac 
actgaccataatcctaatgctgaagcgcatcgtcacgctagcgattggagtgaagttcgt 
ccagaatggggtctagcacgaaatgctgaattcattattgggaaacgtcaaatcacccaa 
20 aatagtaatctagagggacgggcatttcttcataattatgattggacaaaggatgaagac 
ggtgagattttaaatacaattatttctgggccagcactagtagcacaatggattaattta 
caatactacgcctcaaccgtggcacctcactattatggaagcggtagtaaaacaacgcaa 
actgtaacaagtggtgtaggtgtcatgcaaggaaatgctagtgatttaatgtatggctta 
ccatggcagtcagtaatgatgaatgacaaagaggcgtatcacgcacctattaggctttta 
25 attgttattcaagcgccagatgcatatattcaacgtttgttaaaacatcataatcacttt 
agacaaaaggttgatcatcaatggataagacttgccagtattgatgaaaataatagttgg 
aaagactggtag 

Sequence 2010 

VKLPYGVQQDAHEVEDALEFINRVITPLSPISTFAARNPWEGLEDASFDQVARWLKSVRD 
VDiirrWv.oTIHRAISNKEIDLKVFEERLDENRAHYNNRSLSDSDINTYIQRAKNU:riEE 
GYFNTKDNEKLEKWVQTNFKDYKKKEDVIAQSASVFTKEGTRLIDILNAHMIKWSKLYVD 
DFQSSWTMPKREKGFYHAWQRLVKHDPLFTKKQRLTLAHLPNQATEAIEYAFQELGVKEE 
HRQSYIESHLLSLPGWAGIMYHRSQTQSNDAYLLTDYVAIRLSIEMVLLNDHHTTLLKKS 
lYLQKKLEQIRYLLFNIQMNVEQWLNLSSKKQQAYIELGTRFSPFYFKKLWLDAWEETHE 
RRLVDEIYRVPTEDTDQAKAKVQLAFCIDVRSEPFRRHLESEGPFETIGIAGFFGLPIQK 
EVLDEQFAHPSLPVMVEPAYRIKEYADQHEMKIYNQQQHTLTSMFYNFKLMKNNVLPSLL 
LPELSGPFLSIATIANTIFPKKAKRIVHRFSQKWLRKPTGKLTIQREQDAYSKLPIGFTL 
EEQIQFSKKALQLMDLTDDFAPLIVLCGHGSESHNNPYHASLECGACGGASSGFNAKLLA 
VMCNQENVRRGLLMEGIDIPRHTVFIAAEHQTSVDELEYIYVPPLTTEAQNAFDELKHVM 
PKVCYKANLERLASLPNINNTDHNPNAEAHRHASDWSEVRPEWGLARNAEFIIGKRQITQ 
NSNLEGRAFLHNYDWTKDEDGEILNTIISGPALVAQWINLQYYASTVAPHYYGSGSKTTQ 
TVTSGVGVMQGNASDLMYGLPWQSVMMNDKEAYHAPIRLLIVIQAPDAYIQRLLKHHNHF 
RQKVDHQWIRLASIDENNSWKDW* 

Sequence 2011 
Contig_0693_pos_971_603, 
putative peptide of unknown function 

atgaaaaagacgaaaggtatttatgaatctgaaattagtaaagccattacacaatgqgag 
aar.g^^ttttttaggaagaggttctttgtctgtaaaaacagatattcttcgtgatatggtt 
attgttc-atttacaaggtattttgacacctgctgaatatcgcgtgtgtaaaacga^.tgaa 
gggcttttaaatattaaacgtacacgttctgaacttgtagagtctggtgaagaggacttg 
agtcgcattattaaagatttgactggacttaatgtgaaaagttttcatagtgatttaagt 
actattaccggtgaacgcgtaatgatttttaagttggaagatcgttttgataaagcatta 
catgagtaa 

Sequence 2012 

MKKTKGIYESEISKAITQWEKDFLGRGSLSVKTDILRDMVIVSLQGILTPAEYRVCKTNE 
GLLNIKRTRSELVESGEEDLSRIIKDLTGLNVKSFHSDLSTITGERVMIFKLEDRFDKAL 
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HE* 

Seque^nce 2013 
Cont ig_0 694_pos_l 94 3_2 533, 
5 putative peptide of unknown function 

atgatacaacgtaaaggtgaattaatactttcatggattggcaatggattacatctgttg 
tatgtatttttaataggcatatttttcataatgacacaaacaagtgattttaaaaatggg 
atgattcaaggatttatagaagaaaatccgggggaatatgatttagcataccaaacttat 
aacttgatgttaggtttaggtgttgtcctcattattatacttctcattttattaattgta 

10 tcaatagttgccgctattttaattggcaaaaatgccaaagtatcaggaatattacttgtt 
attacaggaatcattggtctctttttaagttttattgctggtgccttatggttaatagca 
ggtatcatgcttttagtacgtaaaccacaaacacaaaatgaccaaatcaattctcaatat 
agtaatgacatacattcacatgttgtgccagaagaaaagaaacgtgaacaacaacaatat 
aatatgaatgaaccacatattggtcaaacatcaacatctcatcatgatcacgcattgaat 

15 gatcaaaataaacgagaaaaccataatcatgataatcaaccatacaaatag 

Sequence 2014 

MIQRKGELILSWIGbjGLHLLYVFLIGIFFIMTQTSDFKNGMIQGFIEENPGEYDLAYQTY 
NLMLGLGVVLIIILLILLIVSIVAAILIGKNAKVSGILLVITGIIGLFLSFIAGALWLIA 
20 GIMLLVRKPQTQNDQINSQYSNDIHSHVVPEEKKREQQQYNMNEPHIGQTSTSHHDHALN 
DQNKRENHNHDNQPYK* 

Sequence 2015 

Con t i g_0 6 9 4_pos_4 7 7 6_6 0 0 5 , 

25 putative peptide of unknown function 

atgcaaacagtcggaattataccttcgccaggtatagcacatcaacatgcaaaaaaaata 
attccaaatgttaaacagttattgtcaaagcgtactaaacatagtcaatggaatttcgac 
atcaaagtcgatctcatgataggatctgcagaggatgtacatgaaagtgttgaaaaagca 
gcacaaattaaagaggaacatcagtgggattacgttgtttgtctgacagatttgcctagt 

30 atttcagataataaagtggttgtcagcgactttaatagtgacaaacatgttgcaatgcta 
teat taccgtcactaggttttattgatttgaagcgcaagctagttaaaacgatgact tea 
ttgattgaacaattatattataatcaaccqaaagacaaaaatgcgccacatccttttgta 
cgcgtgaaggctgtagaacctgacgaagacgccacatcaaaacaacgatatattaatatt 
ttatttatcataagttggattcagttaattggtggactgacacgagcaaatcagccttgg 

35 aaaaacatctttaattttaagaaaatcatttcagttgcctttgcaacaggaacttatgtc 
tcaatattttcaatgccatgggaattaagcgtgatttattcaccgcttcgacttatcata 
ttgatggtgattgctatacttgggatggctggatggctattctatgcgcatcaattgatt 
gaaaagaaaactgctaaatctcagcgtgtatatcgatatatttataattcaaccacactt 
gt t acactaagt ttgattacact cat aaattatgtcattttatatttattgttaa teat c 

40 agtattacactctttgtccctgtggaattatttaatagttggacgagtgcccaatcacaa 
tttacgttctcaaattatatgagattgatttggtttgtatcatcattaggactttt?.gct 
ggag jcr.i gggatcaactgttgaaaatgaagagaaaatacgtcgtattacttattcctat 
agacaatatcatcgttataaagaagctggcaagaacaaaaagaacaagaaacttctcgtg 
atgtatcacaacaaaatgtcgaacaacaaacttcaagtaaagatgaaaataatgaacaat 

45 atgaaggtaaaaaacaaggacatagagaggaggatgacgcatgacaaatcaaaaaactgt 
gggtctagtcgtcgctccaggtgttactga 

Sequence 2016 

MQTVGIIPSPGIAHQHAKKIIPNVKQLLSKRTKHSQWNFDIKVDLMIGSAEDVHESVEKA 
50 AQIKEEHQWDYVVCLTDLPSISDNKVWSDFNSDKHVAMLSLPSLGFIDLKRKLVKTMTS 
LIEQLYYNQPKDKNAPHPFVRVKAVEPDEDATSKQRYINILFIISWIQLIGGLTRANQPW 
KNIFNFKKIISVAFATGTYVSIFSMPWELSVIYSPLRLIILMVIAILGMAGWLFYAHQLI 
EKKTAKSQRVYRYIYNSTTLVTLSLITLINYVILYLLLIISITLFVPVELFNSWTSAQSQ 
FTFSNYMRLIWFVSSLGLLAGAMGSTVENEEKIRRITYSYRQYHRYKEAGKNKKNKKLLV 
55 MYHNKMSNNKLQVKMKIMNNMKVKNKDIERRMTHDKSKNCGSSRRSRCY* 

Sequence 2017 

Contig_0696_pos_564 0_5990, 

is similar to (with p-value I.Oe-27) 
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>gp:gp|AB015981 I AB015981_2 Staphylococcus aureus genes for O 
rfA, MnhA, MnhB, MnhC, MnhD, MnhE, MnhF and MnhG, complete c 
ds. NID: g4001723. 

atgccagtgataagcaaactgaataataatccgagaccatccacatataaattaaaattc 
5 attccaatatgtggcatccaatttaagtttt teat tacgttattacctgaca teg teat t 
gaaatcaatgatatgaaatatataaataagacaataggaacaggtaatacaaaccatcct 
aaatgtatacgtttaacgaaccggtataggaatggaataatgagtgcaaatattaacggt 
aatagtaccgcaatatgtaacaaactcactatgttttcctcctttaaaatttatttacga 
cacatattatacatcattgcgcttcttctgaaaaaccatgtaactgcttga 

10 

Sequence 2018 

MPVISKLNNNPRPSTYKLKFIPICGIQFKFFITLLPDIVIEINDMKYINKTIGTGNTNHP 
KCIRLTNRYRNGIMSANINGNSTAICNKLTMFSSFKIYLRHILYIIALLLKNHVTA* 

15 Sequence 2019 

Contig_0696_pos_5913_3508, 

is similar to (with p-value O.Oe+QO) 

>gp:gpl AB015981 I AB015981_2 Staphylococcus aureus genes for O 
rfA, MnhA, MnhB, MnhC, MnhD, MnhE, MnhF and MnhG, complete c 

20 ds. NID: g4001723. 

gtgagtttgttacatattgcggtactattaccgttaatatttgcactcattattccattc 
ctataccggttcgttaaacgtatacatttaggatggtttgtattacctgttcctattgtc 
ttatttatatatttcatatcattgatttcaatgacgatgtcaggtaataacgtaatgaaa 
aacttaaattggatgccacatattggaatgaattttaatttatatgtggatggtctcgga 

25 ttattattcagtttgcttatcactggcataggaagtctagtggttttatattctatcgga 
tatttaagcaaatcagaacaactcgggaatttttactgttatttattactatttatgggt 
gcaatgttaggagttgtactctctgataattttattattttatatttattttgggagtta 
acttctttctctagctttttattaatctctttctggagagagaaaaaagcttcaatttat 
ggtgctcaaaaatctttaatcattaccgtattaggtggcttgagcatgctaggtggtatc 

30 attttactttccctagctactgatacttttagtattcaggctatgatttcaaaagcaagt 
gacattcaaaatagtcctttctttatcttagtaatgatactttttatgattggtgcattt 
acaaaatctgcacaagtgcctttttatatttggttaccagatgctatggaggcgcctacg 
ccegtgagtgcataccttcattctgcaacgatggtaaaagcaggactatatctaat-cgca 
agaatcacacctatttttgcaatatccgaaggttgggtatggacaattacacttgc tggt 

35 ttaatcaccctattttgggcatcactcaatgcaacaaaacaacatgacttaaaaggtatt 
ttagctttctcaactgtgtctcaactagggatgattatgtctatgcttggtattggtgct 
gtaagttatcattatcaaggcgctaatagtcaactttatgttgctggatttgttgctgcc 
atatttcatttaattaatcatgccacgtttaaaggtgcactatttatgattacaggtggt 
att gat cattcaactggtacacgtgatgttaaaaagttgggcggtt tact tacaatcatg 

40 cctatctcattcacgettacagttattacaacattaagtatggctggtgtgccgcctttt 
aacggetttttatcaaaagagaaattcttagagtcaatgattaatgttacacatttaaat 
ttaatgagtttaaatactttaggtattcttttaccaatcattgccattattggtagtatt 
ttcacatttgtatattcaattaaatttatattgcatatattctttggttcttataaacct 
gaagctctgccaaaacaagcgcatgaatcttcaattttaatgcttatttcacctatcatt 

45 ttaacatcactagttatagtattcggtttattcccaagtatattaacgcaatctattata 
gagccggcatctgtagcagttagtcaaacatcaaatataactgctgagttccatttattc 
ca t ggta t aact ccagcattcctat caacaataggt at ttatatt a ttggt a tttt att a 
ttaatttcatttagttattgggttcgtttattacaagcacatccatatcagttaacgttg 
aatcattggtatgacacttcaggccaacgtattccaggatattccgaaaatataacaaat 

50 agttatgttacaggtttttctagaaataatttggtgattatcttaggtattctcattgct 
ttaacttttgttacagtcatcagtgtacccttcagtattgactttaaaaacgtgagtcat 
ttacgcgtatttgaaggtgcaacagtattgtttttactgattgcttcaactttcattata 
tt».gcBaaatcacgtttgtttagcattatcatgttaagcgctgtgggttacgctatdtca 
gtattatttattttctttaaagcgcccgacttagcattaacacaatttgtagtggaatct 

55 atttctacagcattattcttactatgcttttatcacctacctaatttaaatcgctacaat 
gaaaaaccaacctttaaactgacaaatgctgtgatttcaattggagtgggattatcagtg 
attattttaggattaattggctatggtaatagacactttgactctattactaaattctat 
caagaacatgtt tttgatttagcacatggtaaaaatatggtaaatgtcatcctcgtagat 
ttccgtggtatggatactttattcgagtcatctgtactaggtattgcaggtttaggcgta 
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tdtacaatgattaadttacgattadaacdgadaadtcaatcaagtgaggtgaatgaccat 
gaatag 

Sequence 2020 

5 VSLLHI AVLLPLI FAL I I PFLYRFVKRI HLGWFVLPVPI VLFI YFI SLI SMTMSGNNVMK 
NLNWMPH 1 GMN FNLY VDGLGLLFSLLI TG I GSL VVLYS I G YLSKS EQLGN FYC YLLL FMG 
AMLGVVLSDNFIILYLFWELTSFSSFLLISFWREKKASIYGAQKSLIITVLGGLSMLGGI 
ILLSLATDTFSIQAMISKASDIQNSPFFILVMILFMIGAFTKSAQVPFYIWLPDAMEAPT 
PVSAYLHSATMVt(AGLYLIARITPIFAISEGWVWTITLVGLITLFWASLNATKQHDLKGI 

10 LAFSTVSQLGMIMSMLGIGAVSYHYQGANSQLYVAGFVAAIFHLINHATFKGALFMITGG 
IDHSTGTRDVKKLGGLLTIMPISFTLTVITTLSMAGVPPFNGFLSKEKFLESMINVTHLN 
LMSLNTLGILLPIIAIIGSIFTFVYSIKFILHIFFGSYKPEALPKQAHESSILMLISPII 
LTSLVIVFGLFPSILTQSIIEPASVAVSQTSNITAEFHLFHGITPAFLSTIGIYIIGILL 
LISFSYWVRLLQAHPYQLTLNHWYDTSGQRIPGYSENITNSYVTGFSRNNLVIILGILIA 

15 LTFVTVISVPFSIDFKNVSHLRVFEGATVLFLLIASTFIIFAKSRLFSriMLSAVGYAIS 
VLFIFFKAPDLALTQFVVESISTALFLLCFYHLPNLNRYNEKPTFKLTNAVISIGVGLSV 
IILGLIGYGNRHFDSITKFYQEHVFDLAHGKNMVNVILVDFRGMDTLFESSVLGIAGLGV 
YTMIKLRLKQKNQSSEVNDHE* 

20 Sequence 2021 

Contig_0696_pos_3473_3087, 

is similar to (with p-value 4.0e-57) 

>gp:gp| AB015981 I AB015981_3 Staphylococcus aureus genes for O 
rfA, MnhA, MnhB, MnhC, MnhD, MnhE, MnhF and MnhG, complete c 

25 ds. Win: g4001723. 

gtgattattttcttcatggtcattgtgtttggcttttcattatttttagctggaccittat 
acacctggtggcggtttcgttggtggtttgcttttcgctagtgcattattagttattaca 
attgcatatgatgtaaaaacgatgcgaaagatttttcctttagattttaaaatcttaatt 
ggtattggtttattattttgcgtgggtacaccattaacaagttggttcatgtctaaaaac 

30 ttttttacacatgtcacttttgacatccctttgcctttacttgaacctatgcacatgacg 
acagcgatgttttttgatttcggtgttttatgtgcagttgtaggaactattatgactata 
attatttcgattggagagaacgaatag 

Sequence 2022 

35 VIIFFMVIVFGFSLFLAGHYTPGGGFVGGLLFASALLVITIAYDVKTMRKIFPLDFKILI 
GIGLLFCVGTPLTSWFMSKNFFTHVTFDIPLPLLEPMHMTTAMFFDFGVLCAVVGTIMTI 
IISIGENE* 

Sequence 2023 
40 Contig_0696_pos_3075_274 0, 

is similar to (with p-value 2.0e-37) 

>gp:gp|AB015981 I AB015981_4 Staphylococcus aureus genes for O 
rfA, MnhA, MnhB^ MnhC, MnhD, MnhE, MnhF and MnhG, complete c 
ds. NID: g4001723. 

45 atgatctttgttagtggtattctcacatctataagtgtctatctcgttttgtctaaaagt 
ttgatacgtatcattatggggactacactactaactcatgctgcaaatttatttttaatt 
actatgggaggtttaaagcacggaactgttccaatatttgaaaaaggaacatcaagctat 
gttgaccctatcccccaagcattgattttaacagctatcgttatcgcctttgctacaaca 
gctttctttttagttcttgcatttagaacatataaagaactaggcactgataacgttgag 

50 ctaatgaaaggagcgccagaagatgatagagagtaa 

Sequence 2024 

MIFVSGILTSISVYLVLSKSLIRIIMGTTLLTHAANLFLITMGGLKHGTVPIFEKGTSSY 
VDPIPQALILTAIVIAFATTAFFLVLAFRTYKELGTDNVELMKGAPEDDRE* 

55 

Sequence 2025 

Contig_0696_pos_2408_1257, 

is similar to (with p-value O.Oe+00) 

>gp:gp|AB015981 |AB015981_5 Staphylococcus aureus genes for O 
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rfA, MnhA, MnhB, MnhC, MnhD, MnhE, MnhP and MnhG, complete c 
ds. NID: g4001723. 

atgtttatgttgattggtattattggctcatttacaacaggagatattttcaacttgttt 
gtgttctttgaagtctttttaatgtcttcatattgtttactcgttattggtactactaaa 
5 atacaattacaagaaacaattaagtatattttagtcaatgttgtttcatcgtctttcttt 
gtcatgggtgttgcagttttatattcagttgtaggaactttaaatctcgctcatattagt 
gaaagattgtcacaactttctgtacatgacagtggcttagtcaatattgtttttatttta 
tttatctttgtctttgccactaaagcaggcgtttttcctatgtacgtatggctacctggt 
gcttattatgcccctccagtagcgatcatcacgttctttggtgcactattgactaaagtg 

10 ggtgtatacgcaattgcgagaactctaagtttattctttaataatacagtaagcttttct 
cattatgtcatccttttcttagcattacttacaattatttttggatgtataggtgcgata 
gcttactatgatacgaagaaaatcatcctttacaatattatgattgcagtaggtgtcata 
ttagttggtattgctatgatgaacgaatcaggcatgactggtgcaatatattacacacta 
catgatatgttagttaaagcttcattgttcttactcattggcgtcatgtacaaaatcact 

15 aaaacgactgacttacgtcattttggtggcttgataaaagggtatcctattctaggttgg 
acr'ttctttattgcagcgctaagcttagcgggtataccaccttttagtggtttctcv'jggt 
aaatrct: utattgttcgagcgacctttgaaaaaggattttatctaagtggtatca.. tgta 
cttttatcaagtttaatcgtgttatattcagtcatacgtattttcttaaaaggatttttc 
ggtgaagttgaaggatatact ttatctaaaaaggtaaatgttaaatatctaacaactatc 

20 gctgttgcatctacagttattactgtaatctttggattatctgcagacacgttattccca 
atcatcaaagatggcgctgaaacgtttgtcgatccaagtcaatatattcatagtgtgtta 
ggaggtaaatag 

Sequence 2026 

25 MFMLIGIIGSFTTGDIFNLFVFFEVFLMSSYCLLVIGTTKIQLQETIKYILVNVVSSSFF 
VMGVAVLYSWGTLNLAHISERLSQLSVHDSGLVNIVFILFIFVFATKAGVFPMYVWLPG 
AYYAPPVAIITFFGALLTKVGVYAIARTLSLFFNNTVSFSHYVILFLALLTIIFGCIGAI 
AYYDTKKIILYNIMIAVGVILVGIAMMNESGMTGAIYYTLHDMLVKASLFLLIGVMYKIT 
KTTDLRHFGGLIKGYPILGWTFFIAALSLAGIPPFSGFYGKFYIV£U\TFEKGFYLSGIIV 

30 LLSSLIVLYSVIRIFLKGFFGEVEGYTLSKKVNVKYLTTIAVASTVITVIFGLSADTLFP 
1 1 KDGAET FVDPSQY I HS VLGGK* 

Sequence 2027 
Con t ig_0 6 9 6_pos_l 0 8 1_7 7 6 , 
35 is similar to (with p-value 3.0e-40) 

>gp:gp|i^B015981 1 AB015981_6 Staphylococcus aureus genes .^or O 
rfA, MnhA, MnhB, MnhC, MnhD, MnhE, MnhF and MnhG, complete c 
ds. NID: g4001723. 

atgcttatcattacatttttaactgagttaataaaagcaaactttggtgtactaaaaatt 
40 attctcaaaccacgaattgagaataaacccggattctttgtgtacgagacggaattagaa 
cgtgactggcaacttgttttactttccaacttgattacgttaacacctggcacagtcgtt 
ttaggtattagtgatgaccgtaaaaagatttatatccactcaattgatttcagtacaaag 
gaagaagagattcaaaatatcaaatcttcattagagaaggtcgttagaaaggtaggcgag 
aaataa 

45 

Sequence 2028 

MLIITFLTELIKANFGVLKIILKPRIENKPGFFVYETELERDWQLVLLSNLITLTPGTW 
LGISDDRKKIYIHSIDFSTKEEEIQNIKSSLEKVVRKVGEK* 

50 Sequence 2029 

Contig_0697_pos_264 6_34 34, 

is similar to (with p-value 3.0e-34) 

>sp:sp|P06696|TNPA_STAAU TRANSPOSASE A (TRANSPOSON TN554). > 
pir :pir ( A24584 I A24584 transposition regulatory protein tnpA 
55 - Staphylococcus aureus transposon Tn554 >gp : gp ( X03216 I ISTN5 
54_i :3tc-.phylococcus aureus transposon Tn554. NID: g4 372(>. >g 
p:gpiK02987 |TRN554_1 Transposon Tn554 (from S. aureus), corapl 
ate, containing transposition genes tnpA, tnpB and tnpC, and 
antibiotic resistance genes ermA and spc. NID: gl54 920. 
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gtgtatacttatactattaaaattagaaatgagattattatgaaaatagtagaagtaaaa 
tctaagaatggtaccaattttatgattttagatggtaataatgaacctatagtagatgca 
gtaagatatttgaagtatctggatagtgttaagaaaagtttaaataccaagaaaacctat 
gcctatgcactaaaaaatttttttgtttacttagaaagtaaaaagatatgctataaagaa 
5 gttagttttgataactttgttgattttataagatggatgaaaacaccttttgaatatgag 
aatgtcctctcttatcaccgaaaagaaaaaagcattagtcctaagacaattaatctgact 
atgactgtagtatctaatttttatgattatctctataggagtaaaaaattagatgttaat 
ttctatgattttatgcatatggaaagtaaatactctaaaaaatataaaagtttcatgcat 
cacataaataaggactatagaacgttgaaaaatattttgaaagttaaagaaccaaagaaa 
10 aaaatagaagtgttaactaatgcggaggttaagaaattattagaggaagctaataatatt 
agagataaattcttaatacaattactatatgaaaccggattacgtataggtgaggtatta 
tcattacgtattgatgatattaaatttgactttagaaacccatcgattaaaacgatacta 
ggtggtacaggaatatcaacctgttatccatcgcctacgcctgtcggcctcagcttagga 
cccgactaa 

15 

Sequence 2030 

VYTYTIKIRNEIIMKIVEVKSKNGTNFMILDGNNEPIVDAVRYLKYLDSVKKSLNTKKTY 
AYALKNFFVYLESKKICYKEVSFDNFVDFIRWMKTPFEYENVLSYHRKEKSISPKTINLT 
MTVVSNFYDYLYRSKKLDVNFYDFMHMESKYSKKYKSFMHHINKDYRTLKNILKVKEPKK 
20 KIEVLTNAEVKKLLEEANNIRDKFLIQLLYETGLRIGEVLSLRIDDIKFDFRNPSIKTIL 
GGTGISTCYPSPTPVGLSLGPD* 

Sequence 2031 

Cont ig_0 6 9 7_pos_5 4 60_5 116, 

25 putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
tacacacgtgctacaatggacaatacaaagggcagcgaaactgcgaggtcaagcaaatcc 
cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctgguatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 

30 cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 2032 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKWLSSDCSLQLDYMKLES 
35 LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

Sequence 2033 

Contig_0697_pos_14 39_399, 

putative peptide of unknown function 

40 atgcaaaaagtacgttctgatataatgactcatcgtggttctcactatgatttaggagta 
aagactgctctttggttacaaaccactcctttattaaaaaatcgaaataaagaatggcga 
aagagaattccacgttttgatattgatgtcaaggaaacctatgatatatttcaaatctat 
tcgccacagatttgggaagaaataattggtatgcaagatgtattgaatctacctacaaaa 
caaatgattttaaattttggccattatcgatttactgatttaaaggacagtggttgcaca 

45 gtatataaaggtcgtgattttttagtccgaaattatgattatcatcctgcaacatatgat 
ggtagatacttattatttcaacctaatgacgggggattatctcaaataggaccgacttca 
agaqtgactggtagaatggatggtatgaacgagtatggtttagttatggcatataatttt 
atgcatcgtaaaaagcctgcaaatggatttgtatgttacatggttggtcggctaa::actt 
gaaaattgcaaaaatgtaactgaagcaatcaaatttttaaaggaagtaccgcatcgtagt 

50 teat tcagt tat atactaatggatagacattcgaattatgccattgtcgaagttacacct 
cgatcaatagatgtaaggtatgaacatatatgcacaaatcattttgaattgcttacccat 
gaaaatagaaactatacaagagaatctaaagaacgcttaaatcgtgtaataaataaaaca 
actccttctacaaacaaagatatcgcattcaaattatttaacgacccgcaatacgaaatc 
tatagcaacctatttaaaagttggtctggtacaattcatacttcactatatgaacctaat 

55 tcattaatatcatggatggcattaggtcaaaacagtcatccaacctcaatcaatttttct 
aattggttaaaaggaaagaaattgaatataaattactttgaaggcgaaatagatacacca 
ttaacttttgccacatactaa 

Sequence 2034 
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MQKVRSDIMTHRGSHYDLGVKTALWLQTTPLLKNRNKEWRKRIPRFDIDVKETYDIFQIY 
SPQIWEEIIGMQDVLNLPTKQMILNFGHYRFTDLKDSGCTVYKGRDFLVRNYDYHPATYD 

GRYLLFQPNDGGLSQIGPTSRVTGRMDGMNEYGLVMAYNFMHRKKPANGFVCYMVGRLIL 
ENCKNVTEAIKFLKEVPHRSSFSYILMDRHSNYAIVEVTPRSIDVRYEHICTNHFELLTH 
5 ENRNYTRESKERLNRVINKTTPSTNKDIAFKLFNDPQYEIYSNLFKSWSGTIHTSLYEPN 
SLISWMALGQNSHPTSINFSNWLKGKKLNINYFEGEIDTPLTFATY* 

Sequence 2035 
Contig_0699_pos_1065_1427, 
10 is similar to (with p~value l.Oe-26) 

>gp:gp I AB003188 I AB003188_3 Micrococcus iuteus hexs-a, menG, 
hexs-b gene, complete cds. NID: g2982678. 

gtggcaaagttaaacattaacaacgaaataaagaaagtagaaaagcgacttgaagaagca 
attataagttctgatcaaacattacaagaagcctcattccatttactatcttcaggggga 
15 aaaagagttagacccgcttttgttattttaagtggtcaatttggctctaacaacaaacct 
tcagaagacacgtatcgtgtagcagtagctttagaactaattcacatggctaccttagtc 
cacgatgatgtgatagataaaagtgataaacgtagaggccgactcactatttcaaaaaaa 
tgggaccaaagtacagctattttaacaggaaatttcttacttgctatggggctcaagcac 
tga 

20 

Sequence 2036 

VAKLNINNEIKKVEKRLEEAIISSDQTLQEASFHLLSSGGKRVRPAFVILSGQFGSNNKP 
SEDTYRVAVALELIHMATLVHDDVIDKSDKRRGRLTISKKWDQSTAILTGNFLLAMGliKH 
* 

25 

Sequence 2037 

Cont ig_0 69 9_pos_2 98 8_33 62 , 

is similar to (with p-value l.Oe-18) 

>sp:sp|P31114 |GRC3_BACSU PROBABLE HEPTAPRENYL DIPHOSPHATE SY 
30 NTHASE COMPONENT II (EC 2.5.1.30) (HEPPP SYNTHASE) (SPORE GE 
RMINATION PROTEIN C3) . >gp : gp | M8024 5 I BACVARGNS_5 B.subtilis 
dbpA, mtr(A,B), gerC(l-3), ndk, cheR, aro (B, E, F, H) , trp(A-F) 
, hisH, and tyrA genes, complete cds. NID: gl43798. >gp:gp|Z 
99115 1 BSUB0012_214 Bacillus subtilis complete genome (sectio 
35 n 12 of 21): from 2195541 to 2409220. NID: g2634478. 

atgatcggacattatataggtatgagtttccaaataatagatgatgtgctagattttact 
agttctgaaaagaaacttggtaagccggttggtagtgaccttatgaatggtcatattaca 
ttacctgtactattagaaatgcgaaaaaataagacttttaaagataaaatttcacaactt 
aatccT:9f.cagtcctcaacatgcctttgaaacttgtataacaataattagacagt<;cgaa 
40 agcatagaacaatcaaaacaaataagtgaaaagtatttaaataaagcaatcaatttaatc 
gatgaattagaggatggtcctaataaagaactatttagaaagcttattaaaaaaatggga 
agtcgaaataagtaa 

Sequence 2038 

45 MIGHYIGMSFQIIDDVLDFTSSEKKLGKPVGSDLMNGHITLPVLLEMRKNKTFKDKISQL 
NPDSPQHAFETCITIIRQSESIEQSKQISEKYLNKAINLIDELEDGPNKELFRKLIKKMG 
SRNK* 



Sequence 2039 
50 Con t ig_0 6 9 9_pos_4 5 3 7_5 5 9 8 , 

is similar to (with p-value O.Oe+00) 

>sp:sp|Q59803|AROC_STAAU CHORISMATE SYNTHASE (EC 4.6.1.4) (5 
-ENOLPYRUVYLSHIKIMATE-3-PHOSPHATE PHOSPHOLYASE) . >gp:gp|U319 
79 1 SAU3197 9_3 Staphylococcus aureus chorismate synthase (aro 
55 C) and nucleoside diphosphate kinase (ndk) genes, complete c 
ds, dehydroauinate synthase (aroB) and geranylgeranyl pyroph 
osphate synthetase homolog (gerCC) genes, partial cds. NID: 
g987495. 

atgcl caaacgtcaagggggatatggtcgggggcgtcgtatgaaaattgaaaaag^icact 
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atagaaattgtgtcaggcgtcagaaatggctttactttaggaagtccaataactttagta 
gt tact aacgacgatttcactcattggagaaaaataatgggcgtcgcaccaattagt gat 
gaagaaagagaaaatatgaaacgtaccattacaaaacctagaccgggccatgctgatctt 
ataggtggcatgaaatataatcatcgtgatttaagaaatgtgcttgagcgttcatctgct 
5 agagaaacagcagcaagagttgctgtgggtgctgtttcaaaaattcttttagaacaatta 
gatatccacttatatagccgtgtagtcgaaattggtggtattaaagacaaaggtttatat 
gatgtagatatgttcaaaaataatgtagataaaaatgatgtacgtgtaattgacgaaaat 
attgcgcaacaaatgagagataaaatagatgaagcgaaaaaagacggagattcaatcggg 
ggcgtagttcaagtaatggctgaaaacatgcctattggagtgggaagttatgtacactat 

10 gaccgtaaattagatggacgcattgcacagggtgttgtgagtatcaacgccttcaaaggt 
gtaagttttggtgagggatttaaagcagctgaaaaacctggtagcgaaattcaagatgaa 
attcattataatcaagattcaggctattttagagctacaaatcacttaggtggatttgaa 
ggaggcatgagtaatgggatgcctataattgttaatggtgtcatgaagcctattcctact 
ttatataaaccactaaactcagttgatattaatactaaagaagacttcaaagctactata 

15 gaacgctcagatagttgtgcagtgcccgcagctagcgtagtatgtgaacacgttgtcgct 
tttgagttagcaaaagcagtactcgaagagtttcaatctaaccacatggaccaactcgta 
gcacaaattaaagagcgtcgacaactcaacatagaattttaa 

Sequence 2040 

20 MFKRQGGYGRGRRMKIEKDTIEIVSGVRNGFTLGSPITLVVTNDDFTHWRKIMGVAPISD 
EERENMKRTITKPRPGHADLIGGMKYNHRDLRNVLERSSARETAARVAVGAVSKILLEQL 
DIHLYSRVVEIGGIKDKGLYDVDMFKNNVDKNDVRVIDENIAQQMRDKIDEAKKDGDSIG 
GVVQVMAENMPIGVGSYVHYDRKLDGRIAQGVVSINAFKGVSFGEGFKAAEKPGSEIQDE 

IHYNQDSGYFRATNHLGGFEGGMSNGMPIIVNGVMKPIPTLYKPLNSVDINTKEDFKATI 
25 ERSDSCAVP7VASWCEHVVAFELAKAVLEEFQSNHMDQLVAQIKERRQLNIEF* 

Sequence 2041 

Contig_0701_pos_1174_2742, 

is similar to (with p-value O.Oe+00) 

30 >gp:gp| Z99111 |BSUB0008_149 Bacillus subtilis complete genome 
{section 8 of 21): from 1394791 to 1603020. NID: g2633e99. 
>gp:gp|Z97025|BSZ97025_8 Bacillus subtilis nprE, yla[A,B,C,D 
,E,F,G,H,I, J,K,L,M,N,0] and pycA genes . NID: g2224758. 
atggttgacggtgtcgtactagtggttgacgcatatgaaggtacaatgcctcaaactcgt 

35 tttgttcttaaaaaagctttagaacaaaacttaaaaccggttgtagttgtgaataaaatt 
gataaaccagctgctagacctgagggagttgtagatgaagtattagacttattcattqaa 
ttggaagcgaatgatgagcaattagacttcccagttgtttatgcttcagctgtgaatgga 
acagcaagtttagactctgaaaagcaagacgaaaatatgcaatccctatacgagacgatt 
attgactatgtaccggcaccagtagataattcagatgaaccattacaattccaaattgct 

40 ttactagattataatgattatgtaggtcgtataggcgttggacgtgtgttcagaggtaaa 
atgcgtgtaggtgataatgtatcactaattaaattagatggtacagttaagaactttcgt 
gtgacgaaaatatttggttactttggtcttaaacgtgaagaaattgaagaagcacaagca 
ggagacttaatagctgtttcaggtatggaagatattaacgttggtgaaacagttacacca 
catgatcatcgtgacccattaccggtgttacgtattgatgaaccaaccctagaaatgact 

45 tttaaagtaaataactctccgtttgctggacgtgaaggtgattatgtaacagctcgacaa 
attcaagaaagattagatcaacaacttgaaacagatgtttctttaaaagttacacctact 
gatcaaccagattcatgggttgttgctggtcgtggtgaactacacttgtctattcttatt 
gaaaacatgagacgtgaaggctttgaattacaggtttctaaacctcaagttattttaaga 
gaaatcgatggtgtgttaagtgaaccatttgagcgtgtacaatgtgaagtgccttctgaa 

50 aatgc.icggggcagtgattgagtcattaggtgcacgaaaaggtgaaatgttagatatgatg 
acgaccgacaatggtttgacgcgtttaatctttatggtacctgcacgcggtatgattggt 
tatactactgaatttatgtctatgacacgaggttatggaattattaaccatacatttgaa 
gaatttagacctcgcgttaaagctcaaatcggtggtagacgtaacggtgcattgatttct 
atggaccaaggtcaagcaacatcttatgcgattattaacttagaagatcgtggtgttaac 

55 tttatggaaccaggtactgaagtatatgaaggtatgattgttggtgaacataaccgtgag 
aacgatttaacagtaaatattactaaagcaaagcatcaaacaaacgtacgttcagctact 
aaagatcaaacacaaacgatgaatcgtcctagaattttaacattagaagaagcgttacaa 
tttatcaatgatgatgaattggtggaagtaactcctgaaagtattcgcttaagaaagaaa 
atacttaataaatctgcccgtgaaaaagaagcaaaaagagttaaacaattaatgcaagac 
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gaacaataa 
Sequence 2042 

MVDGVVLVVDAYEGTMPQTRFVLKKALEQNLKPVVVVNKIDKPAARPEGVVDEVLDLFIE 
5 LEANDEQLDFPVVYASAVNGTASLDSEKQDENMQSLYETIIDYVPAPVDNSDEPLQFQIA 
LLDYNDYVGRIGVGRVFRGKMRVGDNVSLIKLDGTVKNFRVTKIFGYFGLKREEIEEAQA 
GDLIAVSGMEDINVGETVTPHDHRDPLPVLRIDEPTLEMTFKVNNSPFAGREGDYVTARQ 
IQERLDQQLETDVSLKVTPTDQPDSWVVAGRGELHLSILIENMRREGFELQVSKPQVILR 
EIDGVLSEPFERVQCEVPSENAGAVIESLGARKGEMLDMMTTDNGLTRLIFMVPARGMIG 
10 YTTEFMSMTRGYGIINHTFEEFRPRVKAQIGGRRNGALISMDQGQATSYAIINLEDRGVN 
FMEPGTEVYEGMIVGEHNRENDLTVNITKAKHQTNVRSATKDQTQTMNRPRILTLEEALQ 
FINDDELVEVTPESIRLRKKILNKSAREKEAKRVKQLMQDEQ* 

Sequence 204 3 

15 Con t i g_0 7 0 l_po s_3 1 9 6_4 2 0 9 , 

putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 
gcttgtttaggaccgacgcttaaacaaacagacagcttacctatacatgagttaatattc 

20 tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattgtttagt 
agtcgatattcaatcattgcactcaacatcgcagaaatctttactcattcagacatggtt 
cttgatatcgacaaggaggtactgattacacatatattcaat tctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagctat 

25 cctaatgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 
aaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 
ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 
attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 

30 aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataataaga 
gggcagcatcattatgacaatcatgtcgtcgattattactacaaactaagaaagcagcct 
aatgagaaacctcataagactgccatcattgcttgtataaat cgat tattaaaaacaatt 
cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

35 Sequence 204 4 

MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDSLPIHELIF 
FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSIIALNIAEIFTHSDMV 

LDIDKEVLITHIFNSTDKGMSMDKATKYALQLRVIAQESYPNVDRHSFLVEKLRLLIQQL 
KQSIHHLKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMIIGEIGDIKRFKSNKQLNAF 
40 VGIDIKRYQSGHTHCRDTINKRGNKE(ARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKT AI I AC I NRLLKTI H YLVMNH KL Y D YQMS PH * 

Sequence 2045 

Contig_0701_pos_6054_6374, 

45 putative peptide of unknown function 

atgtaccggcgatggtatctttttcaactacagctacttgcttaccactttgctttaatg 
ttaacgcagcatgccaagctgcatgaccacttcctaaaaatactacatcatattgtttca 
ttaacactcatcctttcttatttttctatgagatgttttaatgtttgctctagttcttca 
aacacatatttcgtttcatcatactcagtcgtatcaaattcttgtggtaagcaacttgaa 

50 attgcttcaaaaacagcttcttgttgttgttgcccattgtcagttaacgtaattatcaac 
tgtcgtttatcagattgttga 

Sequence 204 6 

MYRRWYLFQLQLLAYHFALMLTQHAKLHDHFLKILHHIVSLTLILSYFSMRCFNVCSSSS 
55 NTYFVSSYSWSNSCGKQLEIASKTASCCCCPLSVNVIINCRLSDC* 

Sequence 2047 

Contig_0702_pos_13157_134 59, 
putative peptide of unknown function 
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gtgttctgtagtaggggcaaaaccaatttgataagttgtttgagttttagttttagttat 
ctttctctctacttttttaggttggagatctacagacttgttttggccatcccttttaat 
ttttacagttgttttcgccgttttattttggtcaagaacatgtttgatatcatcaaaatt 
tttaattttatgattacctacctgaactattttatcacctttgtgcaatccagcttcatc 
5 agctggagatttcttcacaacttctcctatgacattggttggcgtaccttggtagtatgc 
taa 

Sequence 2048 

VFCSRGKTNLISCLSFSFSYLSLYFFRLEIYRLVLAIPFNFYSCFRRFILVKNMFDIIKI 
10 FNFMITYLNYFITFVQSSFISWRFLHNFSYDIGWRTLVVC* 

Sequence 204 9 

Con t ig^ 070 2_po s_14035_12749, 

is similar to (with p-value 2.0e-24) 

15 >sp:sp|P56136| Y258_HELPY HYPOTHETICAL PROTEIN HP0258 . >gp:gp 
iAE000545| HPAE000545_4 Helicobacter pylori section 23 of 134 
of the complete genome. NID: g2313349. 
atgagcattttaattacgattatttcatttatcatcgtatttggtgtactcgtaactgtt 
cacgaatatggacacatgtt ttttgctaagcgagcaggaattatgtgtcctgaatttgcg 

20 attggtatgggtcctaaaatttttagttttcgtaaagatgaaacattatatacaattcgt 
ctattaccggtgggtggttatgtcaggatggctggtgatggtcttgaagaaccaccagtt 
caaccaggtatgaacgtaaaaataaagttaaataaccaagacgaaatcacacatataatt 
ctagatgaccaacataaattccaacaaattgaagccatagaagttaagaaatgtgatttt 
aaagatgacctatatattgaaggtatcacttcttatgatgatgaaaggcatcacttcact 

25 atagcgaaaaaggcattttttgtcgaaaatggaagccttgttcaaattgctccaagagat 
agacagtttacacataagaaaccattgccaaagtttttaacattatttgcaggtccgtta 
tttaattttattttagctttagttctatttattggattagcatactaccaaggtacgcca 
accaatgtcataggagaagttgtgaagaaatctccagctgatgaagctggattgcacaaa 
ggtgataaaatagttcaggtaggtaatcataaaattaaaaattttgatgatatcaaacat 

30 gttcttgaccaaaataaaacggcgaaaacaactgtaaaaattaaaagggatggccaaaac 
aagtctgtagatctccaacctaaaaaagtagagagaaagataactaaaactaaaactcaa 
acaacttatcaaattggttttgcccctactacagaacacagcgtttttaaaccaataagc 
tacg.jtatttataact ttttcgataaaggtaagct tatt t ttacagctgttgttgatatg 
ttagctagtatatttacaggagaattttcatttgatatgttaaatggccctgttgytatt 

35 tatcacagtgttgattctgttgttaaatctggaattattaatttagtaggatacaccgct 
ttattaagtgttaacttaggaataatgaatttgctacctattccagcgcttgatggtggt 
cgcatattatttgtactatatgaggctatttttagaaaaccagtgaataaaaaagcggaa 
acaggaattattgctgtaggcgcactttttgtggttattattatgattttagtcacttgg 
aatgatatacaacggtatttcttataa 

40 

Sequence 2050 

MSILITIISFIIVFGVLVTVHEYGHMFFAKRAGIMCPEFAIGMGPKIFSFRKDETLYTIR 
LLPVGGYVRMAGDGLEEPPVQPGMNVKIKLNNQDEITHIILDDQHKFQQIEAIEVKKCDF 
KDDLYIEGITSYDDERHHFTIAKKAFFVENGSLVQIAPRDRQFTHKKPLPKFLTLFAGPL 
45 FNFILALVLFIGLAYYQGTPTNVIGEVVKKSPADEAGLHKGDKIVQVGNHKIKNFDDIKH 
VLDQNKTAKTTVKIKRDGQNKSVDLQPKKVERKITKTKTQTTYQIGFAPTTEHSVFKPIS 
YGIYNFFDKGKLIFTAVVGMLASIFTGEFSFDMLNGPVGIYHSVDSVVKSGIINLVGYTA 
LLSVNLGIMNLLPIPALDGGRILFVLYEAIFRKPVNKKAETGIIAVGALFWIIMILVTW 
NDIQRYFL* 

50 

Sequence 2051 

Contig_0702_pos_10795_64 67, 

is sj.inilar to (with p-value O.Oe+00) 

>gp:gp|D86727|D86727_l Staphylococcus aureus DNA for DNA pel 
55 yinerase III, complete cds . NID: gl483181. 

gtggttattattttggcaatgacaaatcgagaaaagtttaaagtgcttgccgatcaaata 
aaaatatcaaatcaactagaacaagatattcttgaacaaggtgaactcactcgtatagat 
gtttcaaataaaaacagaacatggactttccaaatatcactcccacattttttatctcat 
gaagattatcttctttttacacatgcaattgaagaagaatttaaagaaatagctacagta 
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gcaattgatttttcaattaaagataccaacaatcaagatgagtttgctttaaaatatttc 
ggacattgtattgatcaaacacgattgtcgccaaaagtgaaaggtcaattgaaacaaaaa 
aaactcattatgagtggaaatgttttaaaagtcttagtttcaaatgacattgagagaaat 
cattttgataaggcatgtaatggtagtttggttaaagcatttagacagtgtggctttgaa 
5 attgataaagtcgtttttgaaacagattcaacaaatcacgatgatgaccttgcatcgtta 
gaagcacatattcaacaagaagatgaacaaagtgcaagagaagcaactgaaaaattagaa 
aaaatgaaagcagaaaaagcgaaacaacaagataataatgaaagtacagtggaaaaatgt 
cagattggaaaaccaattcagattgaaaatataaaaccaattgaatcaattattgaagaa 
gaattcaaagtagctattgaaggtgttatatttgatattaacctaaaagaacttaaaagt 

10 ggacgtcatatagttgagcttaaagttactgattacacagattcacttgtattaaaaatg 
tttacaagaaaaaataaagatgacttggaccactttaaggcacttagtgttggtaaatgg 
gttagagctcaaggtcgtattgaagaagatacttttgttagggatcttgtcatgatgatg 
tcagatattgaagaaattaaaaagacacctaaacaagataaagcagaagataagcgtgta 
gagtttcatttacatacgtctatgagtcaaatggatggtattcctaatattagtgcatat 

15 gttgaacaagctgctaaatgggggcaccaagctttagcagtaacagatcacaacgtagta 
caagcttttcctgatgcacataatgctgccgaaaaacatggtattaagatgatttatggt 
atggaaggtatgctagtagacgatggtgttcctatagcttataaaccaacagaccgtaat 
ttaaaagatgcaacatatgtggtgtttgacgtagagacaacaggtctttctaatcaatat 
gataaaattattgaattagctgcagtaaaagtgcataacggtgaaattatagataagttt 

20 gaacgttttagtaatccacacgaaagattatctgaaaccattatcaatcttacacatatc 
actgatgatatgttaactgatgctcccgaaattgaagaagtgttaactgaatttaaagag 
tgggttggagatgctatatttgtagctcataatgcttcatttgatatgggatttattgac 
acaggatatgaaaggttaggctttggaccttctacaaacggtgtaattgatacacttgag 
ct*:tcacgtacaattaataccgaatatgggaaacatggtttgaatttccttgccaa'iaaa 

25 tatggtgtcgaattaacgcaacatcatagagcgatttatgatacagaagcaacag-..ttat 
atttttataaaaatggttcaacaaatgaaagaactaggtgtgaacaaccatctagaaatt 
aataaaaaattaactaatgaagatgcatataaaagagctcgtccatctcacgttacactc 
attgttcaaaatcaagaaggtcttaaaaatttatttaaaatagttagtgcttcattagtt 
aagtattattaccgtacgccaagaattccacgttctcttttaaatgaatatcgagaaggg 

30 atcttgattggtacagcttgtgatgagggtgaattabtcacagcagtaatgcagaaggat 
cagtcggaagtagaaaaaatagcaaagttctatgattttatagaagttcaaccgcctgcg 
ctttatcaagatttaatggatagagaattaatacgagataatgaaacgttaacacaaatt 
tacaagcgattaatagatgctggtaaaagcgctaatatcccagtgattgctactggtaac 
gcgcattatctatatgaacatgatgctatagccagaaaaattttaattgcatcccaacca 

35 gggaatccattaaatcgttcaacattaccagaagctcactttagaaccactgatgaaatg 
ttagatgattttcacttcttaggtgaagaaaaagcatatgaaatcgttgtaacaaataca 
aatgagctcgctaataaaattgaaaaagtggttcctataaaagataaactatttacgcca 
agaatggatggggctaatgaagaaattcgtgagttgagttattctaatgcgaaaaaacta 
tatggtgaagatttaccacaaattgttatagatcgccttgaaaaggaattagatagtatt 

40 attggtaatggcttttctgttatttacctcatatctcaacgtttggtgaagaaatcgcta 
gatgatggttatttagttggatcgcgtggttcagttggttctagtttcgtagcaacaatg 
actgaaattacagaagttaatccgcttccaccacactacatttgttcacattgtaagaca 
agtgagttctttgatgatggttcggttggatctggattcgatttaccagataaaaaatgt 
cctacttgtggtaatgaattaattaaagaaggacaagatatcccttttgagacattcctt 

45 ggatttaaaggagataaagttccagatattgatttgaactttagtggtgaatatcaacct 
aacgctcataattacacaaaagtattgtttggtgaagataaagtatttcgtgctggaaca 
ataggtactgttgctgaaaaaacagcttttggtttcgtaaaaggttacttaaatgatcaa 
ggtattcacaaacgtggtgctgaaattgatcggttggttaaaggttgtacaggggtcaaa 
cgtacaactggtcaacatcctggaggaatcattgttgtaccggattacatggatatttat 

50 gattttacaccgattcaattcccagcagacgaccaaagtgcagcgtggatgacaacccat 
ttcgacttccattcaatacacgataatgtcttaaaattagatatattaggacatgatgac 
ccaacgatgattcgtatgttacaagacttatcaggaattgaccccaaaactataccagta 
gatgataaagaaacaatgcaaatatttagtggtcctgagagtttaggtgttacagaagac 
gaaatattatgtaagacaggtacatttggtgtaccagaatttggtactggatttgtacgt 

55 caaatgcttgaagatactaagccaacgacattctcagaattagttcaaatttcaggttta 
tctcatggtacggacgtttggttaggtaatgcacaagagttaattcgtcaagggatatgt 
gacttatctagtgtgataggctgtcgtgatgatatcatggtatatctgatgtatgctgga 
cttgaaccgtcaatggcttttaaaacgatggaatttgtacgtaaaggtcgtggcttaaca 
gatgaaatggttgaagcgatgaaggaaaataacgtgccagattggtatttagattcttgt 
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cgtaaaattaaatatatgttccctaaagctcatgccgctgcttatgtactgatggctgta 
agaattgcatactttaaagtacatcatccactatattattatgcagcatactttaccata 
agagcttccgattttgaccttataacaatgattaaagataaaacgagtattcgtaataca 
gttaaagatatgtattcacgatatatggatttagggaaaaaagagaaagatgtattaact 
5 gtattagaaataatgaatgaaatggcgcatcgaggttttcgattgcaaccgattagttta 
gaaaaaagccaagcttttgactt cat cattgaaggggatacattgattcctccat teat t 
tcagtgccaggacttggagaaaacgttgcacaaagaattgttgaagcgagagaagaggga 
ccatttttatccaaagaagatttaaataaaaaagccggcttatctcaaaaggttattgac 
tatttagatgaattaggctcattgccagatttacctgacaaggcacaattgtcgatattt 
10 gatatgtaa 

Sequence 2052 

VVIILAMTNREKFKVLADQIKISNQLEQDILEQGELTRIDVSNKNRTWTFQISLPHFLSH 
EDYLLFTHAIEEEFKEIATVAIDFSIKDTNNQDEFALKYFGHCIDQTRLSPKVKGQLKQK 

15 KLIMSGNVLKVLVSNDIERNHFDKACNGSLVKAFRQCGFEIDKVVFETDSTNHDDDLASL 
EAlJlOQEPEQSAREATEKLEKMKAEKAKQQDNNESTVEKCQIGKPIQIENIKPIE.TIIEE 
EFKVAILGVIFDINLKELKSGRHIVELKVTDYTDSLVLKMFTRKNKDDLDHFKAL3VGKW 
VRAQGRIEEDTFVRDLVMMMSDIEEIKKTPKQDKAEDKRVEFHLHTSMSQMDGIPNISAY 
VEQAAKWGHQALAVTDHNVVQAFPDAHNAAEKHGIKMIYGMEGMLVDDGVPIAYKPTDRN 

20 LKDATYVVFDVETTGLSNQYDKIIELAAVKVHNGEIIDKFERFSNPHERLSETIINLTHI 
TDDMLTDAPEIEEVLTEFKEWVGDAIFVAHNASFDMGFIDTGYERLGFGPSTNGVIDTLE 
LSRTINTEYGKHGLNFLAKKYGVELTQHHRAIYDTEATAYIFIKMVQQMKELGVNNHLEI 
NKKLTNEDAYKRARPSHVTLIVQNQEGLKNLFKIVSASLVKYYYRTPRIPRSLLNEYREG 
ILIGTACDEGELFTAVMQKDQSEVEKIAKFYDFIEVQPPALYQDLMDRELIRDNETLTQI 

25 YKRLIDAGKSANIPVIATGNAHYLYEHDATARKILIASQPGNPLNRSTLPEAHFRTTDEM 
LDDFHFLGEEKAYEIVVTNTNELANKIEKVVPIKDKLFTPRMDGANEEIRELSYSNAKKL 
YGEDLPQIVIDRLEKELDSIIGNGFSVIYLISQRLVECKSLDDGYLVGSRGSVGSSFVATM 
TEITEVNPLPPHYICSHCKTSEFFDDGSVGSGFDLPDKKCPTCGNELIKEGQDIPFETFL 
GFKGDKVPDIDLMFSGEYQPNAHNYTKVLFGEDKVFRAGTIGTVAEKTAFGFVKGYLNDQ 

30 GIHKRGAEIDRLVKGCTGVKRTTGQHPGGIIVVPDYMDIYDFTPIQFPADDQSAAWMTTH 
FDFHSIHDNVLKLDILGHDDPTMIRMLQDLSGIDPKTIPVDDKETMQIFSGPESLGVTED 
EILCKTGTFGVPEFGTGFVRQMLEDTKPTTFSELVQISGLSHGTDVWLGNAQELIRQGIC 
DLSSVIGCRDDIMVYLMYAGLEPSMAFKTMEFVRKGRGLTDEMVEAMKENNVPDWYLDSC 
RKIKYMFPKAH7VAAYVLMAVRIAYFKVHHPLYYYAAYFTIRASDFDLITMIKDKTSIRNT 

35 VKDMYSRYMDLGKKEKDVLTVLEIMNEMAHRGFRLQPISLEKSQAFDFIIEGDTLIPPFI 
SVPGIGEHVAQRIVEAREEGPFLSKEDLNKKAGLSQKVIDYLDELGSLPDLPDKAOLSIF 
DM* 

Sequence 2053 
40 Cont ig_0702_pos_5 64 7_4 57 7 , 

is similar to (with p~value O.Oe+00) 

>sp:sp| P32727 |NUSA_BACSU N UTILIZATION SUBSTANCE PROTEIN A H 
OMOLOG (NUSA PROTEIN) . >pir :pir 1 C36905 | C36905 nusA horaolog - 
Bacillus subtilis >gp : gp | Z18631 | BS0RF1T7A_2 B.subtilis infB 

45 -nusA operon. NID: g49314, 

atggacgaaggt teat ttagagtgattgcacgtaaagaagtcgtagaagaagtgttt gat 
gacagagatgaagttgatttaagtactgctttagtcaaaaatcctgcctatgaagtagga 
gatatttatgaacaagatgtaacaccgaaagacttcggacgtgtaggagctcaagcagct 
aagcaagctgtgatgcaacgacttagagacgcagaaagagaaattttatatgatgaattt 

50 atcgataaagaagaagatattctaacaggtgtgattgaccgtgtagaccatcgctatgta 
tatgtgaatttaggaagaattgaagctgtgetgtcagaagctgaaagaagteetaatgag 
aaatatattcctaatgaacgtatcaaggtgtacgtaaataaagttgaacagaetacaaaa 
ggtccacaaatttacgtatcaagaagtcatcctggattactaaaacgcttattcgaacaa 
gaagttccagaaatttatgatggtactgttattgttaaatcagtagcgcgtgaagctgga 

55 gat ':qttctaaaattagcgtgtattctgataatcctgatatagatgctgttggcgc atgt 
gtaggtt'jtaaaggagcacgagtagaagcggttgttgaagaacttggtggcgaaaoaatc 
gatatcgtccaatgggatgaagatccgaaagtatttgttcgtaatgctttaagtccatca 
caagttttagaagtaattgttgatgaagagaatcaatcaactgtagttgtagttcctgat 
taccaattatccttagctataggtaaaagagggcaaaacgcacgtttagctgctaaatta 
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acaagttggaagatagatattaaatcagaatctgatgcccgagaagctggaatttatcct 
gttattgaatcagaagaagttgcagatgaaattgttaattccggtgacgaagatgttgag 
tttgataatgttaacttggaagagacaaacttaactagtacagaattagctgctgaaaat 
gatgaagataaaaaagataaaacagaagaagataatgacacagaatcatag 

5 

Sequence 2054 

MDEGSFRVIARKEWEEVFDDRDEVDLSTALVKNPAYEVGDIYEQDVTPKDFGRVGr^QAA 

KQAVMQRLRDAEREILYDEFIDKEEDILTGVIDRVDHRYVYVNLGRIEAVLSEAEkSPNE 
KYIPNERIKVYVNKVEQTTKGPQIYVSRSHPGLLKRLFEQEVPEIYDGTVIVKSVAREAG 
10 DRSKISVYSDNPDIDAVGACVGSKGARVEAVVEELGGEKIDIVQWDEDPKVrVRNALSPS 
QVLEVIVDEENQSTVVVVPDYQLSLAIGKRGQNARLAAKLTSWKIDIKSESDAREAGIYP 
VIESEEVADEIVNSGDEDVEFDNVNLEETNLTSTELAAENDEDKKDKTEEDNDTES* 

Sequence 2055 
15 Contig_0702_pos_4266_3958, 

is similar to (with p-value 3.0e-21) 

>sp:sp| P32729i YLXQ_BACSU PROBABLE RIBOSOMAL PROTEIN IN NUSA- 
INFB INTERGENIC REGION (0RF4). >pir : pir ! E36905 I E36905 hypoth 
etical protein 2 (infB 5' region) - Bacillus subtilis >gp:gp 
20 I Z18631 |BS0RF1T7A_4 B. subtilis infB-nusA operon. NID: g49314 
. >gp:gp|Z99112|BSUB0009_132 Bacillus subtilis complete geno 
me (section 9 of 21): from 1598421 to 1807200, NID: g2633902 

atgaaaatttttaatttgcttggtttagctatgagagctggtaaaatcaaaagtggcgaa 
25 tcggtcatcttaaatgagcttaaaaagaatcaaataaaacttgtcatattagctagcgat 
gcatctagtaacactctaaaacaaatgaataataaatgtaatagttaccaagtgccatta 
aaag'^gt i tggtactagaaatgaattagggttagcaataggtaaaagcgatagagi:caat 
attgytataacagataatggttttgcaaaaaaattgttatcaatgatagatgaatatggt 
aaggagtga 

30 

Sequence 2056 

MKIFNLLGLAMRAGKIKSGESVILNELKKNQIKLVILASDASSNTLKQMNNKCNSYQVPL 
KVFGTRNELGLAIGKSDRVNIGITDNGFAKKLLSMIDEYGKE* 

35 Sequence 2057 

Contig_0702_pos_3953_1791, 

is similar to (with p-value 0-Oe+OO) 

>sp:sp|P17 88 9| IF2_BACSU TRANSLATION INITIATION FACTOR IF-2. 
>pir :pir IA35269IA35269 translation initiation factor IF-2 - 

40 Bacillus subtilis >gp: gp IM34 8 36 i BACPSIF2A_1 B. subtilis prote 
in synthesis initiation factor 2 (infB) gene, complete cds. 
NID: gl43358. >gp: gpl Z18631 I BS0RF1T7A_5 B. subtilis infB-nusA 

operon. NID: g49314. 
atgagtaaaaaaagaatttacgaatatgcgaaagaattaaatctaaagagtaaagagatt 

45 atagatgagttaaaaagtatgaatgttgaagtgtcaaatcatatgcaagctttagaagaa 
gaacaaatcaaagcattagataaaaaatttaaagcctctcaagcgaaagacactaataaa 
caaaatactcaaaataatcaccaaaaatctaataataaacaaaattctaacgataaagaa 
aaacaacaaagtaagaataatagtaaaccaacgaagaaaaaagaacaaaacaacaaagga 
daacagcaaaataaaaacaataaaactaataagaatcaaaaaaacaataaaaataaaaag 

50 aataataaaaataataaacctcaaaatgaggtagcagaaacaaaagaaatgccctctaaa 
atcacttatcaagaaggcataactgtcggtgagttagctgaaaagctaaatgtagaatca 
gctggtattattaaaaaattgttcttactaggtattatggctaatatcaatcaatcattg 
gatgaagaaacattagaattaattgcagatgactatggcgttgaaatagagaaagaagta 
gtcgttgatgaagaagatttatcaatttattttgatgatgagactgatgattctgatgca 

55 attgaacgtccagcagttgttacaatcatgggccacgtagaccatggtaaaacgacttta 
ttagattctattcgtaacactaaagttacagaaggagaagctggcggaatcactcaacat 
attggtgcttatcaaattgaaaattcaggtaaaaaaattacgttcttagatactcctgga 
catgctgcatttacgactatgcgtgcacgtggtgctcaagttactgatattacaatttta 
gtcgtggccgctgatgatggtgtgatgcctcaaacaattgaagctataaatcacgctaaa 
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gaagcagaagtacctacgattgttgcagtaaacaaaattgataaaccaactgctaaccct 
gatcgtgttatgcaagaactcactgagtatggattaattccagaagactggggcggtgac 
acaatctttgtaccactatctgcattgagtggagacggtattgatgatttattagaaatg 
atcggtttagtagcggaggtacaagaacttaaagctaatcctaataaacaagctgtaggt 
5 actgtgattgaggctgaattagataaatcacgaggtccagctgcatctttacttgttcaa 
aatggtactttaaacgttggagatgcaattgttgtaggtaatacttatggacgtatacgt 
gcaatggttaatgatttaggaaaaagaattaaatctgccggtccttcaacacctgtagaa 
attactggtattaacgatgttccacttgcaggtgatcgttttgttgtatttggtgatgaa 
aaacaagcacgtcgaattggtgaagcacgtcatgaggcaagtgtcatacagcaacgtcaa 

10 gaaagtaaaaatgtttcattagacaatttatttgagcaaatgaaacaaggtgaaatgaaa 
gatttaaatgtcatcattaaaggtgatgtacaaggttcagttgaagcattggccgcatct 
ctaatgaaaatagatgttgaaggtgtgaatgtacgaattattcatacagctgttggtgct 
atcaatgaatcagatgttacattagcaaatgcatcaaatggtattattattggttttaat 
gtacgcccagatgcaggtgcgaaacgtgcggctgaagctgaaaatgtagatatgcgatta 

15 cacagagttatctataatgttattgaagagatagaatcagctatgaaaggtttacttgac 
ccagaatttgaagagcaagtcattggacaagctgaagtgcgtcaaacatttaaagtttct 
aaagttggtacaattgctggtagttatgtgactgaaggtaaaatcactcgtaacgctggt 
gtacgcgtaattagagatggtatcgtgttatttgaaggtgaacttgacacattaaaacgt 
ttcaaagatgatgctaaagaagtagctcaaggctatgaatgtggtattacaattgaaaaa 

20 tataatgatctcaaagaaggagacattattgaagcgtttgaaatggtagaaattcaaaga 
taa 

Sequence 2058 

MSKKRIYEYAKELNLKSKEIIDELKSMNVEVSNHMQALEEEQIKALDKKFKASQAKDTNK 
25 QNTQNNHQKSNNKQNSNDKEKQQSKNNSKPTKKKEQNNKGKQQNKNNKTNKNQKNNKNKK 
NNKNNKPQNEVAETKEMPSKITYQEGITVGELAEKLNVESAGIIKKLFLLGIMANINQSL 
DEETLELIADDYGVEIEKEWVDEEDLSIYFDDETDDSDAIERPAVVTIMGHVDHGKTTL 
LDSIRNTKVTEGEAGGITQHIGAYQIENSGKKITFLDTPGHAAFTTMRARGAQVTDITIL 
VVAADDGVMPQTIEAINHAKEAEVPTIVAVNKIDKPTANPDRVMQELTEYGLIPEDWGGD 
30 TIFVPLSALSGDGIDDLLEMIGLVAEVQELKANPNKQAVGTVIEAELDKSRGPAASLLVQ 
NGTLNVGDAIVVGNTYGRIRAMVNDLGKRIKSAGPSTPVEITGINDVPLAGDRFVVFGDE 
KQARRIGEARHEASVIQQRQESKNVSLDNLFEQMKQGEMKDLNVIIKGDVQGSVEALAAS 
LMKI DVEGVNVRI IHTAVGAINESDVTLANASNGI I IGFNVRPDAGAKRAAEAENVDMRL 
HRVIYNVIEEIESAMKGLLDPEFEEQVIGQAEVRQTFKVSKVGTIAGSYVTEGKITRNAG 
35 VRVIRDGIVLFEGELDTLKRFKDDAKEVAQGYECGITIEKYNDLKEGDIIEAFEMVEIQR 



Sequence 205 9 

Cont ig_07 02_pos_l 52 1_1 138, 

40 is similar to {with p-value 2.0e-34) 

>sp:sp| P32731 (RBFA_BACSU RIBOSOME-BINDING FACTOR A (P15B PRO 
TEIN) . >pir:pir IG36905 IG36905 protein P15B homolog - Bacillu 
s subtilis >gp:gp| Z18 631 |BS0RF1T7A_7 B.subtiiis infS-nusA op 
eron. NID: g49314. >gp:gp| Z99112 | BSUB0009_135 Bacillus subti 

45 lis complete genome (section 9 of 21): from 1598421 to 18072 
00. NID: g2633902. 

gtgagtaaagtaaaaataaagagaggtgagatgatgaataatataagagcagaacgtgta 
ggagaacaaatgaaacaggaaatcatggacattgttaataataaagttaaagaccctaga 
gttggttttttaacaattactgatgttgaactaaccaatgacctttcacaagcaaaggta 
50 tatttaacagtgttagggaatgataaagaagttgataatacgtttaaagctttgcataaa 
gcaactgggtttataaaatctgaacttggttctcgaatgcgcctaagaattatacctgag 
ttaacattcgaatatgatgaatctatcgaatacggtaataagatagaacgcatgattcaa 
gagttacacaaaaatgataaataa 

55 Sequence 2060 

vskvkikrgemmnniraervgeqmkqeimdivnnkvkdprvgfltitdveltndlsqakv 
yltvlgndkevdntfkalhkatgfikselgsrmrlriipeltfeydesieygnkiermiq 
eli;'^:dk* 
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Sequence 2061 

Cont ig_07 02_pos_ 100 0 8 3 , 

is similar to (with p-value 6.0e~46) 

>gp:gp|X92946)LLLPK214_14 Lactococcus lactis sp. lactis plas 
5 mid pK214, complete sequence. NID: g2467210. 

atgtataatggcatactaccggtatttaagaaacgaggtttaacaagtcacgacgtcgtt 
tttaaattacgtaaaattttaaaaatgaaaaaaattggtcatacaggaacattagatcct 
gaagttaatggtgtgttaccaatttgtttaggcgatgcgacaaaagtcagtgattatatc 
atggaaatgggaaaaacttatcacgctatgataacgctaggaaagagtacgactactgaa 

10 gacc^aactggagatattttagaaactagggctgttgataagaatgatattaatgriagat 
acgattgaccaagtgttgcagcaatttgaggggcatattcaacaaattccgcctatgtat 
tcttctgttaaagtaaatggaagaaaattatatgaatatgcgagaaataatgaaactgtc 
gaacgccctaaacgacaagtttttattaaagatatacatagaatatctgaagttactttt 
caggagcagacatgtcattttgaagttgaagtaacatgtggtaaaggaacttatattaga 

15 actttagctacagatattggacttaaacttggttttccagctcatatgtcacgtctaact 
agaattgcttctggcggttttcaattagaaagtagtttaacgattgatcaaattaaagaa 
ttacatgagcatgattcattacataatgaattgtttcctatagaatatggcttaaaaggt 
ctgaaatcattccaagtgaaagattcaaattttaaaaagaaaatctgtaacggtcaaaaa 
tttcataaaaaagtgttaagtcaaaatgttaaagaaccttttatatttgtcgatagtagc 

20 actcaaaaagttttagcaatatatatagttcatccagataaaccttatgaaataaaacct 
aaaaaagtttttaattaa 

Sequence 2062 

MYNGILPVFKKRGLTSHDVVFKLRKILKMKKIGHTGTLDPEVNGVLPICLGDATKVSDYI 
25 MEMGKTYHAMITLGKSTTTEDQTGDILETRAVDKNDINEDTIDQVLQQFEGHIQQIPPMY 
SSVKVNGRKLYEYARNNETVERPKRQVFIKDIHRISEVTFQEQTCHFEVEVTCGKGTYIR 
TLATDIGLKLGFPAHMSRLTRIASGGFQLESSLTIDQIKELHEHDSLHNELFPIEYGLKG 
LKSFQVKDSNFKKKICNGQKFHKKVLSQNVKEPFIFVDSSTQKVLAIYIVHPDKPYEIKP 
KKVFN* 

30 

Sequence 2063 

Contig__0703_pos_784_1236, 

putative peptide of unknown function 

atgtatgatgaattagaggttaataaaagtcgttttaaaaactgtaattttaatgaaggt 
35 atttttaagaatatagaagcaatttgtaattgtaaatttacaacgtgcgggtttaataat 
tgtattttcgaagatgttcatttttacaaaaaccaatttaaagattcaacatttgtgaat 
acaccatttgatcaatccgtatttaatagtactttattccaaaatgcaatgttcgatagc 
aatctcattcgtagcgtaaaatggactgatatcatttttaaaaacgtttctttcaaaaat 
gtagaaattgaaggaacaacatttaaagatgtaaaattcaaaaattgtgagttcaaaaat 
40 gtaattattactaattcaactatgtcgcaaaagttaatgaatgaattacaaaaacaagat 
gttactttagaaaatatagacacttctatttaa 

Sequence 2064 

MYDELEVNKSRFKNCNFNEGIFKNIEAICNCKFTTCGFNNCIFEDVHFYKNQFKDSTFVN 
45 TPFDQSVFNSTLFQNAMFDSNLIRSVKWTDIIFKNVSFKNVEIEGTTFKDVKFKNCEFKN 
VIITNSTMSQKLMNELQKQDVTLENIDTSI* 

Sequence 2065 

Contig_0703_pos_4 474_527 4, 

50 putacivp; peptide of unknown function 

atgaaagatagggttgaaacagaagaatatgctagaaatcaattaatctctaaaaattca 
attttaagtgaagaaaatttatcattgaaaaaccaaatgttaagtacaaacaatgacgtc 
ggtcaacacgcttttaaaaacgccaagcgtgaattaagaaaaatattaaatagatttaaa 
gaagagggtcgtttacgatcatatacaattgttcctacgagtaatttggctgttaaacat 

55 ccccttttcgaatatgcacgttcattcgattttattatcattactgatgttggtttgata 
aatgtggatgttaaaaattggaaccaaaaaacgttttatcattttgatgtgccagatcaa 
catcttgaagaaggacaaccacaatataataccgaaaaagttgtcggtcattatattagc 
aatcgatatcatagtcagtttaaaacaacacgttctggtgtctatacttttattgagatt 
ttacaggataatcgtgtaatatatgaattttatgaccacgatccatacgataaagccgca 
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aacaatgcaaaagcattaaaagataaaattgaaaatgattataattttaaaattcaaagt 
attggcgtcatatattttagtgatggtagcgttaatattattgaaggatccgacgagagt 
gataaatacgtcgacaccgtatctacaccgatatcacttgaaaaagtaattgaggaagct 
atcgatttatctaagcacccccttactgataaacaaatcgaagaaatttctgaaaacttt 
5 aaacaacatatgaataattaa 

Sequence 2066 

MKDRVETEEYARNQLISKNSILSEENLSLKNQMLSTNNDVGQHAFKNAKRELRKILNRFK 
EEGRLRSYTIVPTSNLAVKHPLFEYARSFDFIIITDVGLINVDVKNWNQKTFYHFDVPDQ 
10 HLEEGQPQYNTEKWGHYISNRYHSQFKTTRSGVYTFIEILQDNRVIYEFYDHDPYDKAA 
NNAKALKDKIENDYNFKIQSIGVIYFSDGSVNIIEGSDESDKYVDTVSTPISLEKVIEEA 
IDLSKHPLTDKQIEEISENFKQHMNN* 

Sequence 2067 
15 Contig_0703_pos_3987_34 81, 

putative peptide of unknown function 

atgcatctatttctgaatgatgattttatattaccatcgaaaatagatatttgtgttgag 
cgcaaaaagttattggaaattatcagggtgattcctgaagaatttacgattcactattat 
gataatcagttatatgaaagattaagagaatctttatctttatcagatgttgaacatgtt 

20 aaagtctttaaaaacgatacagaagtcatgacaatatttgtttatgatgttgtaaatgat 
gaatggctatttaggttagatcatcatatacgtttaccaaaaaataatatatattttcat 
tctttaagttggaacgtagattatattaagccagaaatagttcttatgtatgatttaatg 
agtgaacaaaaatatcatcagtttagtaattataaagctgttattgattctcttagttat 
tatcaattctatattttaaaattggtagtaggtgaacagcgtattaaaaaagctatagta 

25 aatagttccactaaaaagatatcttaa 

Sequence 2068 

MHLFLNDDFILPSKIDICVERKKLLEIIRVIPEEFTIHYYDNQLYERLRESLSLSDVEHV 
KVFKNDTEVMTIFVYDVVNDEWLFRLDHHIRLPKNNIYFHSLSWNVDYIKPEIVLMYDLM 
30 SEQKYHQFSNYKAVIDSLSYYQFYILKLVVGEQRIKKAIVNSSTKKIS* 

Sequence 2069 

Con t i g_0 7 0 3_po s_2 1631420, 

putative peptide of unknown function 

35 atgtcgtcattatgccttttgcctggagctgaaacgataatgattttacgttcagggtct 
tcattcactataggacaatcgttttttacgctgtccctaggtacaaccggaatgattact 
tatgcaagctatgcacctaaaaatatgacgataaagtcttcagcactttcaattgtcgta 
atgaatattttaatttctgtcttggctggattagctatatttcctgcgcttaaaacattt 
ggttaccaaccccaagaaggccctggcttattatttaaggttttaccactagtatttagc 

40 gaaatgacttttggtacattcttttactttatatttttactattattcttatttgcggca 
ttaacgtcttctatatcattattagagttaaatgtatctaattttactaaaaatg<:-.taat 
agtaaaagacaaaaagtggcaatcataggtagtatacttgtatttatcattagtacccca 
gcaacattatcttttagtagtctaagtcatttgcgttttggcgctggtacgatatttgat 
aatatggattttattgtatctaatattcttatgccattaggggcactaggaacaacatta 

45 gtggttggccaattactagataaaaaattattaaaagaaagctttgggaaagacaaattc 
aacctatttttaccgtggtattatttaattaagttcatcatgcctattgttattatttta 
gtatttatagttcaattattttaa 

Sequence 2070 

50 MSSLCLLPGAETIMILRSGSSFTIGQSFFTLSLGTTGMITYASYAPKNMTIKSSALSIVV 
MNILISVLAGLAIFPALKTFGYQPQEGPGLLFKVLPLVFSEMTFGTFFYFIFLLLFLFAA 
LTSSISLLELNVSNFTKNDNSKRQKVAIIGSILVFIISIPATLSFSSLSHLRFGAGTIFD 
NMDFIVSNILMPLGALGTTLVVGQLLDKKLLKESFGKDKFNLFLPWYYLIKFIMPIVIIL 
VFIVQLF* 

55 

Sequence 2071 

Con t ig_0 7 0 3__pos_0_3 9 9 , 

is similar to (with p-value 9.0e-32) 

>gp:gplU93874 |BSU93874_1 Bacillus subtilis cysteine synthase 
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(yrhA), cystathionine gamraa-iyase (yrhB) ^ YrhC (yrhC) , YrhD 
(yrhD), formate dehydrogenase chain A (yrhE), YrhF (yrhF) , 

formate dehydrogenase (yrhG), YrhH (yrhH), regulatory protei 
n (yrhl), cytochrome P450 102 (yrhJ) , YrhK (yrhK) , hypotheti 
5 cal protein YrhL (yrhL), putative anti-SigV factor (yrhM) , R 
NA polymerase sigma factor SigV (sigV) and YrhO (yrhO) genes 
, complete cds, and YrhP (yrhP) gene, partial cds. NID: gl93 
4604. >gp:gp|Z99117 I BSUB0014_206 Bacillus subtilis complete 
genome (section 14 of 21): from 2599451 to 2812870. NID: g26 
10 34966. 

atgattgcatacgatttgataggacaaactccattagttttattagaaagctttagtgac 
gagaatgttaaaatatacgccaaacttgagcaatttaatcctggtggtagcatcaaagac 
cgtctagggaagtacttaattgaaaaagcaatagatgaaggacgacttaaagaaggggat 
acaatagt cgaagcgactgctggtaatacaggcattggacttgctattgcttctaatcgg 
15 cacaaagtaaaatgtatcatctttgctccagaaggatttgcagaagaaaaaatttcaatt 
atgaaagcattgggtgcagatgttagacgtacccccaaagctgagggaatgactggcgca 
cagcaagaggcgttggcatacgcaacacgatatgTACTA 

Sequence 2072 

20 MIAYDLIGQTPLVLLESFSDENVKIYAKLEQFNPGGSIKDRLGKYLIEKAIDEGRLKEGD 
TIVEATAGNTGIGLAIASNRHKVKCIIFAPEGFAEEKISIMKALGADVRRTPKAEGMTGA 
QQEALAYATRYVL 

Sequence 2073 
25 Contig_0704_pos_804_1115, 

putative peptide of unknown function 

gtgttatacagaaaaatggttgctaaaataaaacttaaacgcctatacaaaccctattca 
caaactttatttttgttgttcattttaaaatatttttactattttactataatccttgtt 
ttttattccaaatactttacatcatccttgggtaaagaaggattgattcttatcctcatt 
30 aataataaatgtaattataaaagcctttccgtgaactcaataagtctgaattctaaaaag 
cgaaacagaaatcttatcatattcttctttgtttcaatccattaccgaaccccaacttgc 
tttgtctgttga 

Sequence 2074 

35 VLYRKMVAKIKLKRLYKPYSQTLFLLFILKYFYYFTIILVFYSKYFTSSLGKEGLILILI 
NNKCNYKSLSVNSISLNSKKRNRNLIIFFFVSIHYRTPTCFVC* 

Sequence 2075 
Contig_0704_pos_2016_0, 
40 is similar to (with p-value 3.0e-76) 

>sp:sp|P47994 |SECA_STACA PREPROTEIN TRANSLOCASE SECA SUBUNIT 
. >pir:pir I S47149 I S47149 secA protein - Staphylococcus carno 
sus >gp:gp 1X79725 1 SCSECA_2 S.carnosus (TM300) secA gene. NID 
: g499333. 

45 atgacctgtttagatagaataactggtcgtatgctacctggaacaaagcttcagtctggt 
ttacatcaagctatagaggctctggaaaatgttgaaatttctcaagatatgagtgtgatg 
gcaaccataacattccaaaacttatttaagcaatttgatgaattttcaggtatgactgga 
acaggtaaattaggggaaaaagaattctttgatttatattcaaaagttgttatagagatt 
ccgactcacagtccgattgaacgagatgatagacctgatagagtatttgctaatggtgac 

50 aaaaagaacgatgcaattttaaagacagtgattggtatacatgaaactcaacaacctgtg 
ttactaattacacgtactgcagaagcggcagaatatttttcagctgagttatttaaacgt 
gatatacccaacaatttattaatcgctcaaaatgtagctaaagaggcacaaatgati'gct 
gaggc:gcc,acaattatctgcagttactgttgctacaagtatggcagggcgtggaa^.cgat 
ataaagttatcaaaagaggttcatgatatcggtggcttagcagtgattattaatgaacat 

55 atggataatagccgtgttgatcgtcaattaagaggacgctcaggtcgccaaggagatcct 
ggatattcacagatttttgtatcacttgatgatgatttagtaaaacgttggagtaactct 
aacttggcagaaaataaaaacctccaaacgatggatgcatctaaactagaaagtagtgca 
ctctttaaaaaacgtgtaaagtcaattgttaataaagcgcaacgtgtatctgaagagact 
gctatgaaaaatagagaaatggcaaatgaattcgaaaaaagtattagtgttcaacgagat 
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aaaatttatgctgaacgtaatcacatacttgaagcaagcgattttgatgattttaatttt 
gaacagcttgcacgagatgtgtttacaaaagacgttaaaaatcttgacttaagtagtgaa 
cgtgcacttgtgaattatatatacgaaaacttaagttttgtcttcgatgaagatgtatca 
aatattaatatgcaaaatgatgaagaaatcatacaattcttaatacaacaatttactcaa 
5 caatntaacaatcgtttagaagttgctgctgattcatatttaaaacttcgtttca-^tcaa 
aaatcaattttgaaagcgatagatagcgaatggattgaacaagtagataacttacaacaa 
cttaaagccagtgtaaacaatcgacaaaatggacagcgtaatgtcatttttgaatatcat 
aaagtggctcttgaaacgtatgaatatatgtctgaagatataaaaaggaagatggttaga 
aatttatgtttaagtattctagcctttgataaggacggagata 

10 

Sequence 2076 

MTCLDRITGRMLPGTKLQSGLHQAIEALENVEISQDMSVMATITFQNLFKQFDEFSGMTG 
TGKLGEKEFFDLYSKVVIEIPTHSPIERDDRPDRVFANGDKKNDAILKTVIGIHETQQPV 
LLITRTAEAAEYFSAELFKRDIPNNLLIAQNVAKEAQMIAEAGQLSAVTVATSMAGRGTD 

15 IKLSKEVHDIGGLAVIINEHMDNSRVDRQLRGRSGRQGDPGYSQIFVSLDDDLVKRWSNS 
NLAENKNLQTMDASKLESSALFKKRVKSIVNKAQRVSEETAMKNREMANEFEKSISVQRD 
KIYAERNHILEASDFDDFNFEQLARDVFTKDVKNLDLSSERALVNYIYENLSFVFDEDVS 
NINMQNDEEIIQFLIQQFTQQFNNRLEVAADSYLKLRFIQKSILKAIDSEWIEQVDNLQQ 
LKAS VNNRQNGQRNV I FE YHKVALET YE YMSEDI KRKMVRNLCLS I LAFDKDGDX 

20 

Sequence 2077 

Contig_0704_pos__1840_1244, 

is similar to (with p-value 3.0e-87) 

>sp:sp| P24277 |RECR_BACSU RECOMBINATION PROTEIN RECR. >gp:gpl 

25 D26185I BAC180K_85 B. subtilis DNA, 180 kilobase region of re 
plication origin. NID: g467326. >gp : gp | X17014 | BSRECM_3 Bacil 
lus subtilis dnaZX and recR genes and two unidentified readi 
ng frames. NID: g453238. >gp: gp | Z99104 | BSUB0001_21 Bacillus 
subtilis complete genome (section 1 of 21) : from 1 to 213080 

30 . NID: g2632267. 

atgcattatccagaacctatatcaaagcttatcgatagttttatgaaactgccaggcatt 
ggaccaaagacggctcaacgtctggcttttcatactttagatatgaaagaagacgatgtt 
gttaagtttgctaaagcactagttgatgttaaaagagaacttacctattgtagtgtttgt 
gggcatattacagaaaatgatccttgttatatatgtgaagataaacagcgagatcgttct 

35 gtcatatgtgtagttgaagatgacaaggatgtcatagcaatggaaaaaatgcgtgaatat 
aaaggtttatatcacgtgcttcatggttcgatttcaccaatggatggtattgggcctgaa 
gacatcaatatacctgcattagttgaacgcctcaaaaacgatgaggtgaaagagcttata 
ttagctatgaatcctaacctagaaggcgagtctactgcaatgtatatatctaggttggtt 
aaaccaattgggattaaagtcacaagactggcacaaggtttatctgtaggcggcgattta 

40 gaatatgctgatgaagtgactttatctaaagcaattgcaggtagaacggaaatgtaa 

Sequence 2078 

MHYPEPISKLIDSFMKLPGIGPKTAQRLAFHTLDMKEDDWKFAKALVDVKRELTYCSVC 
GHITENDPCYICEDKQRDRSVICWEDDKDVIAMEKMREYKGLYHVLHGSISPMDGIGPE 
45 DINIPALVERLKNDEVKELILAMNPNLEGESTAMYISRLVKPIGIKVTRLAQGLSVGGDL 
EYADEVTLSKAIAGRTEM* 



Sequence 2079 

Cont i g_0 7 0 6_pos_l 8 3_1 4 1 8 , 

50 is similar to (with p-value O.Oe+00) 

>sp:sp|P55189| YBAR_BACSU HYPOTHETICAL 46.4 KD PROTEIN IN RRN 
G-FEUC INTERGENIC REGION. >gp : gp | D84213 | BACTHRTRNA__3 Bacillu 
s subtilis genome, trnl-feuABC region. NID: gl256147. 
gtggcgcttttagtcacacctttagttaaagaacatggtgttgaatatctttttgctgct 

55 acgatattgatggggttaatacaattacttttaggaatacttaaagtcggtcgtttaatg 
aaatttattccccgtccagttatgattggatttgttaatgcattgggtattatgattttt 
atgtctcaaatagagcatatt tttaatatatctatagcaacttatatatacgttattata 
actttactaattgtatatgttattcctagattttataaagctatacctgccccattaata 
gctataattgttttaacagcattgtatatgtatacaggatctgacgtaagaactgtaggt 
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gatttaggtaatattaagcaaactttgcctcatttcttaatccctaacattccttttaat 
ttagaaacacttcaaattattttcccatattcattatcaatggctattgtgggtctagta 

gaaagcctacttacagctaaaattgtagatgatgcaactgatacacacagtagtaaaaac 
aaagaatcaagaggtcaagggattgctaatattattactggtttctttggcggtatggga 
5 ggttgtgccatgattggacaatcagtaatcaatgttaagtcaggtgcaaatagtagattg 
tctacatttacagcaggtattgtacttatgtttatgattattgttctcggtggtgtagtc 
gttcaaatcccgatgccaattcttgcagggataatggttatggtttctatcgggactgtt 
gattggaattcatttaagtatataaaaaaagcacctaaaacagatgcattcgttatggtt 
ttaacggttattattgtattaataacacataacttagctatcggtgtagttgtaggtgtt 

10 gttttcagtgctatattctttgcgactaaaatttctaaagttgaagttatttataaagag 
ttaggtaagcagcatcgtttttcttttaaaggtcagatattctttgtatcaattgattct 
atgatggagaaaattgattttaatattgaagatagtgtcatagtgttgaattttgatcac 
gctcacctatgggatgattcagcagtaaatgccattgatacgattgttaggaaatttgaa 
gagaaaaacaatattgtgtatgttgaaaaattaaacgcagatagccgaaaaattatttca 

15 gaactaagttatttaaatgaaactcatttaaattaa 

Sequence 2080 

VALLVTPLVKEHGVEYLFAATILMGLIQLLLGILKVGRLMKFIPRPVMIGFVNALGIMIF 
MSQIEHIFNISIATYIYVIITLLIVYVIPRFYKAIPAPLIAIIVLTALYMYTGSDVRTVG 
20 DLGNIKQTLPHFLIPNIPFNLETLQIIFPYSLSMAIVGLVESLLTAKIVDDATDTHSSKN 
KESRGQGIANIITGFFGGMGGCAMIGQSVINVKSGANSRLSTFTAGIVLMFMIIVLGGVV 
VQIPMPILAGIMVMVSIGTVDWNSFKYIKKAPKTDAFVMVLTVIIVLITHNLAIGVVVGV 
VFSAIFFATKISKVEVIYKELGKQHRFSFKGQIFFVSIDSMMEKIDFNIEDSVIVLNFDH 
AHLWDDSAVNAIDTIVRKFEEKNNIVYVEKLNADSRKIISELSYLNETHLN* 

25 

Sequence 2081 

Cont ig_070 6_pos_l 4 22_1 84 4, 

putative peptide of unknown function 

gtgttttttatgtataattcaattttgcttgcagcagatggttcaaaaaatagtatacgt 
30 gcagcccaagagttactaaattttataggtgattatacaattgtcacaatacttacggtt 
gtagatattgaagaatcaaaaacagatgttttacatgatcaccaaggaactaatctaact 
caaaaaagggaaagtaaattacaatctattaaagatttatttacagaacacaatgtgaat 
tacaaaattaaaattgttcacggtactccaacagataaagtggttgaagtttcaaatagt 
ggtgaatatcaagctatcattttaggtacacgtggcttaaacagttttcaaaaaatggtt 
35 cttggtagtgtaagtcataaagttgctaaacgttctcaaatacctgtaattattgttaaa 
taa 

Sequence 2082 

VFFMYNSILLAADGSKNSIRAAQELLNFIGDYTIVTILTWDIEESKTDVLHDHQGTNLT 
40 QKRESKLQSIKDLFTEHNVNYKIKIVHGTPTDKWEVSNSGEYQAIILGTRGLNSFQKMV 
LGSVSHKVAKRSQIPVIIVK* 

Sequence 2083 
Contig_0706__pos_529'7_64 36, 

45 putative peptide of unknown function 

atgaaaaaaatatggattttaactataggtatgtttgccttaggtatggatgcttatata 
gtagcaggat taataccttcaataagtaaaagttttaataaaagtagctctgctattggg 
caaggagtaacagtttttacattgtttttctctatctctgcccccattttttcaacaata 
ttagctaaatccccagttaaaaaaatactaataatagcattcagtatatttactttagcc 

50 aatattataaccgcaatatctatgaactacatgctatatatcgtatcaagagcaatcgct 
ggtttaggagctggcgtattctcaccaattgcaataagtgcaagcaatcatttagtctcc 
gaaaagcataaaggaaaagcaatcgcttttacagtaggcggaatgagtgtaggaactgtt 
ataggagttcctctcggactagaaattgccaacatttctaattggcgatttgcaatgttg 
gttattattgtcattagttttattgcattaataagcatatctatattgatgcctaaattt 

55 aar:ataaaagctcctccaaatttaaaagatcgttttcaattatttttaaacaagcatgta 
ctaagagttatttcggttacattatgcgctgccattgctagtttaggtttgta tact tat 
ttagccgatattattaaaacaaatacagatacaaaaaatttaactcattaccttacagcg 
tggggaataggcggattaataggaagttttggtataggatttattatagatagatttaaa 
aatacaagatttgttatgctaattattttaattttactagcattaagttttggtttaatt 
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cctatttctattaacttgcctatattaggtttaattccctttattttatggggagctatg 
ggatgggctacacaagctcctcaacaacatatattattgaaaaaacatcctgaatatgga 
ggctctgctgtcgctttaaatagttctattaattatttaggcagtgctatgggatcagca 
atcggaggaattattttatttaatgctaatagtacaaatgtactaatatatagtgcttta 
5 ggaattactattattggtattttattacaattactaaatttatccctagaaaaaaattaa 



Sequence 2084 

MKKIWILTIGMFALGMDAYIVAGLIPSISKSFNKSSSAIGQGVTVFTLFFSISAPIFSTI 
10 LAKSPVKKILIIAFSIFTLANIITAISMNYMLYIVSRAIAGLGAGVFSPIAISASNHLVS 
EKHKGKAIAFTVGGMSVGTVIGVPLGLEIANISNWRFAMLVIIVISFIALISISILMPKF 
KIEAPPNLKDRFQLFLNKHVLRVISVTLCAAIASLGLYTYLADIIKTNTDTKNLTHYLTA 
WGIGGLIGSFGIGFIIDRFKNTRFVMLIILILLALSFGLIPISINLPILGLIPFILWGAM 
GWATQAPQQHILLKKHPEYGGSAVALNSSINYLGSAMGSAIGGIILFNANSTNVLIYSAL 
15 GITIIGILLQLLNLSLEKN* 

Sequence 2085 

Cont i g_0 7 0 6_po s_5156_4323, 
putative peptide of unknown function 

atgggggattttatgaattttaattatatgaaggacttcattaaagttgtcgagtacaat 
agcttaaataaagcttctagagaattgaatataagtacaccagccttaagtaaaagaatt 
agaagtattgaagattattttgattgtcagttattttatagaacttctaagggtattttt 
ttaactcaaaaaggaaatgctgtttatcattcgtttttaaaaataaacgagcaatttgaa 
gaattaaaaagtaaaataagtgagtctaaagataaaagaattaaattaggagtcatacct 
agcttttctttatataagctacatgaaaagaattatattaatgaaagtgtcgttttagtt 
attgaaaatagtacttcaattttacttgaggaaatttataaagataatatagatgttgtt 
ati 3c,tg^tattacacatttaaaaaataactcgctctatacacaagaaatttacaf;•i;gag 
gactttattgtggcttatggtgatgaaaataagtttaaaaacactaaccaagtaagtata 
aaatctttaaaaaatgagaatatttatatacaaacaccgccatgtgatacatatgatctt 
cttaaaaattattccttgaaagataagttacatattgattatgttgattattatgaaact 
atattagctaatgtcaaagcaaataaaggtattactttgactcctcaatctttgactaca 
agacttgaaagtatgaacttatatcaaaaaaaattacaaagttataaaagagttgtaggc 
gtagtatcacgtgataaagataagatgaataaaattattgatataattcagtaa 

Sequence 2086 

MGDFMNFNYMKDFIKVVEYNSLNKASRELNISTPALSKRIRSIEDYFDCQLFYRTSKGIF 
LTQKGNAVYHSFLKINEQFEELKSKISESKDKRIKLGVIPSFSLYKLHEKNYINESVVLV 
lENSTSILLEEIYKDNIDVVIGDITHLKNNSLYTQEIYTEDFIVAYGDENKFKNTNQVSI 
KSLKNENIYIQTPPCDTYDLLKNYSLKDKLHIDYVDYYETILANVKANKGITLTPQSLTT 
RLESMNLYQKKLQSYKRVVGVVSRDKDKMNKIIDIIQ* 

Sequence 2087 

Contig_0706_pos_3655_3065, 
putative peptide of unknown function 
45 gtgggtgaattcaaattccctagtggatcgataaaagtaacacgtaaaggtattgaaaaa 
gatcaggaagagtatttaagacagtttgaaaaagaaaatgataaaagtatagaatttgat 
gctgatgagatgagcgctaaaatcggagaattatttggagtagaatataaagatggtttg 
cctattgataattctggtggtggtggtgctccacctaaatcagaaagttataacccctca 
gtttttaacgctcatctattttctatagtggctagtgaagattctttaaaagaagagtat 
50 aaatttttcggtttcaaagagaccattatctctttatatactgcctatttatctaattat 
aaaaaagataagtgggttactaaaggtggtgctccttatcatgtttcaaatagagaagag 
agcttagagaagaaagcatttgatattataactgggtattatttagaagctatgaatact 
cgtaaagaatggaactttaatactataaaatcttatcaaaagttaataagagttgtgttg 
gtttcttatcaagttatagaacgagaacttatggaggaaaatgaagaataa 

55 

Sequence 2088 

VGEFKFPSGSIKVTRKGIEKDQEEYLRQFEKENDKSIEFDADEMSAKIGELFGVEYKDGL 
PIDNSGGGGAPPKSESYNPSVFNAHLFSIVASEDSLKEEYKFFGFKETIISLYTAYLSNY 
KKDKWVTKGGAPYHVSNREESLEKKAFDIITGYYLEAMNTRKEWNFNTIKSYQKLIRVVL 
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VSYQVIERELMEENEE* 
Sequence 2089 

Contig_0706_pos_2458__2060, 
5 putative peptide of unknovm function 

atgcaaatgttacttaaagataaagttgacagtgaaggtaaatctattatctatgaagga 
caaagaggatctgggggaggagaattagatactcatgatcccatgtatttttcgaattta 
attactagtaaaatatttttaattaatcaagtagaaaaacataataagttttttatagct 
gaaaagctttttgaaggttatgaaagtaaatgtatggaaaaatatggaacaaattatcac 
10 ggagcaataatagagacaagagatgaaaataggatattagccgatgaaacaatacgtaaa 
ttttactttttaacgaaagacatggaaaaagagaatgaatatttaagcttagatgattta 
aataaatttagacttttaatatttgcatttactcattga 

Sequence 2090 

15 MQMLLKDKVDSEGKSIIYEGQRGSGGGELDTHDPMYFSNLITSKIFLINQVEKHNKFFIA 
EKLFEGYESKCMEKYGTNYHGAIIETRDENRILADETIRKFYFLTKDMEKENEYLSLDDL 
NKFRLLIFAFTH* 

Sequence 2091 
20 Contig_0707_pos_4 98_965, 

putative peptide of unknown function 

atgcaattagatgtttttcctgaattaaacgtagatcaactatctcaaaaagtgagaaaa 
atacttaatgcagaaccagataaatatattaaaaatagcttgcgaggattaatagaagag 
cggtatttattgtttattctcgaacaatctggtattaatgatgaaatgactgcacatcat 
25 ttatcaaatcaacagtttcaaacttttattaatcttctaaagacttttacctttacagta 
gatggcacacttccattagacaaagcttttgtgacaggcggcggtatttctttaaaagaa 
atagaaccaaaaactatgatgtctaaattagtgccaggattatttttatgtggcgaagtc 
ctagatattcatggatacactggaggttataatataacaagcgcattagttacgggtcat 
gttgctggaatgtttgctggtgaatttaaaatagatcaaaacaaataa 

30 

Sequence 2092 

MQLDVFPELNVDQLSQKVRKILNAEPDKYIKNSLRGLIEERYLLFILEQSGINDEMTAHH 
LSNQQFQTFINLLKTFTFTVDGTLPLDKAFVTGGGISLKEIEPKTMMSKLVPGLFLCGEV 
LDIHGYTGGYNITSALVTGHVAGMFAGEFKIDQNK* 

35 

Sequence 2093 

Contig_0707_pos_1717_1337, 

put.^tivf^ peptide of unknown function 

atgataaatgcaaatttaagtgctagtactagagtgaaacaaaatgctcgtacatcgata 
40 aatgaaattgttagtaacgctttaagtcaacttaataaagtaaccacaaataaagaagtt 
gatgaaatagttaacgaaacgattgaaaaacttaagtcaatacaaataagagaagataaa 
atattgagtagtcaacgttcatcaacatctatgacggaaaaatctaatcaatgttatagt 
tccgagaataatacaattaaatctctaccagaggcaggaaatgctgataaatcactacca 
ttagcaggagttactttaatatctggtttagcaatcatgtcctcacgtaaaaagaaaaaa 
45 gataaaaaagtaaatgactaa 

Sequence 2094 

MINANLSASTRVKQNARTLINEIVSNALSQLNKVTTNKEVDEIVNETIEKLKSIQIREDK 
ILSSQRSSTSMTEKSNQCYSSENNTIKSLPEAGNADKSLPLAGVTLISGLAIMSSRKKKK 
50 DKKVND* 

Sequence 2095 
Contig_0707_pos_0_4 97 , 
putative peptide of unknown function 
55 atggatatttttctttttttgatttttctgctctttataaacaaactgactactggagat 
gattcaatgaagacctataagccgtaccgacatcaattaaggcgttcgctatttgcctca 
acgattttcccagtatttatggtgatgattattggtttaataagcttttatgctatttat 
ati :gggtcgaacattgcaccattcatcagcatacctatcaaactcaaaccgaat1 icaa 
cgtatCQdcaaacattttcatacgtttgttacgcagcaacaaaaacaatggcgtcatgtt 
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gatttatcacatccaactgatatcacaaagatgaaacgccaactattaaaacaagtccat 
caacaacctgcgatattgtattacgatttaaaaggttcttcacaatctttcacaaacaat 
tatgaacaattagacacaacaaagatgtatttaatatcaaaatatcgaattgattttaaa 
gacgatacttatatcct 

5 

Sequence 2096 

MDIFLFLIFLLFINKLTTGDDSMKTYKPYRHQLRRSLFASTIFPVFMVMIIGLISFYAIY 
IWVEHCTIHQHTYQTQTELQRIDKHFHTFVTQCX^KQWRHVDLSHPTDITKMKRQLLKQVH 
QQPAI LY YDLKGS SQS FTNN YEQLDTTKMYLI SKYRI DFKDDT Y I L 

Sequence 2097 

Con t i g_0 7 0 8_pos_ 1581_3635, 

is similar to (with p-value O.Oe+00) 

>gp:gp| AF090142 I AF090142_1 Staphylococcus epidermidis lipase 

15 precursor (gehD) gene, complete cds . NID: g3789931. 

gtgatttctttgacaaataataatagaataaaaagatttagcattagaaaatacgcagtg 
ggagtagtatcaattattacaggtgtaacaat tttcatcggagggcaacaagctcaagca 
gctgaaacttcagtgcaacatgcggatgcacacccagaagactcgcaaacaacacaacaa 
ttaaaaaatgataaggtagaagaaacgttaaaagcttcaaaacaaggtactgcttatagt 

20 caacaagtacaaacaattaatcaatctaaaacaaatcaaaacaaccaacattctgtagct 
gaaagtgaacaactgaaaagtgatgagacagctaatcagccaaaaaaagaagaaggtagt 
tcagtaaaacaagacgtccaaccatctaaaaatgtaaatcaacaagacgcagctactcaa 
tcaaatgagagaaaaaatattgacataaaaggtgaaggtcaaacttcaaagacaagcaat 
caacatattcagagttctaacagtcataatcaatcaacagaaacaaaagacagcgactca 

25 gaagaaatcgatcaaccattagtgaaattacaaaagccgtctaatgattctacatatcaa 
acacaatcaaaagctaaacaagatagttctaaacagctccctcaagaaaaaacaacaaaa 
cgtcaaatccaaacaactgaaaatgaacagacaactaaagttgattctaaaaaagctaat 
gacactcaaaatgttgaacaacatactcaagagcctaaaaatgatacatcaacatcacaa 
aaaaatcatcatcaagtagctacaaaagaacaatctaatagaagtacaacaagggavjacg 

30 cacd Tsgci-atcagcaaatgctaatcaaaatcatcagtctacacatcaagcacagt^-caaa 
aaccaatatccagtagtatttgtccatggtttcctaggctttgcaggtgataatcaattt 
agtttagctccaaaatattggggtggtacaaaatacaatattgacagaaatttaactaat 
gagggatacaatgtacatgaggcaaatattggtgcttttagtagtaactatgatcgcgca 
gtagaattgtattactatgtcaaaggaggacgtgttgattacggtgcagcgcatgcagct 

35 aaatatggtcatcatcgctatggtcgaacatacaagggcatcatgcgtgattgggaacct 
ggcaaaaaaattcattttataggtcacagtatgggtggtcaaaccattcgtcaaatggaa 
gaattcttaagaaatggtaaccaagaagaaatagaatatcaacgtcaacatgggggtact 
atatccgatttatttacaggtggtaaagataatatggttgcttcaattactacacttggc 
acaccacataatggtacacctgctgcagataaaattggcacacgtaaacttgtaaaagaa 

40 acgattaatcgtattggtagattaagtggtggtaaagatgtagatatagatttaggtttt 
tctcaatggggattaaaacaacaaccaaatgaaagctacattgattacgcggaacgtgta 
tccaaaagtaagatttggaatactgaagatcaagctgttaatgatctgacaacgcaaggt 
gctgaaaaaattaatcaacaaacaagtctaaatcctaatattgtctacactacttataca 
gggtcagcgactcacacaggacctcttggtaatgaattacctaattctagtgaaattctt 

45 ttgttgaacttaaccagccgtattattggtaaagatgcaaacaaagaaattagaccgaat 
gatggtgtagttccagttatatcatcacaacatccttctaatcaagcctttaaaaaagtt 
gatgatcatacaccagctactgataaaggtgtttggcaagtgagacccgttcaacatgat 
tgggaccatttagatttagtaggtatggatgcatttgatttaacacatacaggtagagaa 
tt3ggtcaattctatctaggtattatggataatatcatgagaatcgaagaagcage.:ggt 

50 attaoao-ataaataa 

Sequence 2098 

VISLTNNNRIKRFSIRKYAVGWSIITGVTIFIGGQQAQAAETSVQHADAHPEDSQTTQQ 
LKN DKVEETLKAS KQGT AYSQQVQT I NQSKTNQNNQH S VAESEQLKS DETANQ PKKEEGS 
55 SVKQDVQPSKNVNQQDAATQSNERKNIDIKGEGQTSKTSNQHIQSSNSHNQSTETKDSDS 
EEIDQPLVKLQKPSNDSTYQTQSKAKQDSSKQLPQEKTTKRQIQTTENEQTTKVDSKKAN 
DTQNVEQHTQEPKNDTSTSQKNHHQVATKEQSNRSTTRETQKQSANANQNHQSTHQAQFK 
NQYPVVFVHGFLGFAGDNQFSLAPKYWGGTKYNIDRNLTNEGYNVHEANIGAFSSNYDRA 
VELYYYVKGGRVDYGAAHAAKYGHHRYGRTYKGIMRDWEPGKKIHFIGHSMGGQTIRQME 
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EFI.RNGNQEEIEYQRQHGGTISDLFTGGKDNMVASITTLGTPHNGTPAADKIGTRKTiVKE 
TINRJGH'^SGGKDVDIDLGFSQWGLKQQPNESYIDYAERVSKSKIWNTEDQAVNDl.TTQG 

aekinqqtslnpnivyttytgsathtgplgnelpnsseilllnltsriigkdankeirpn 
dgvvpvissqhpsnqafkkvddhtpatdkgvwqvrpvqhdwdhldlvgmdafdlthtgre 
5 lgqfylgimdnimrieeadgitnk* 

Sequence 2099 
Cont ig_07 1 O_pos_l 68_1 115, 
is similar to (with p-value 7.0e-45) 
10 >gp: gp I Z99108 I BSUB0005_69 Bacillus subtilis complete genome 
(section 5 of 21): from 802821 to 1011250. NID: g2633055. >g 
p:gp| D78509 I D78509_ll Bacillus subtilis YfjG-YfjR genes, com 
plete cds. NID: g2780390. 

atgacggtgaccgttagatttcaatctttatcgcaacctcttacattagtttcaaatgtg 
15 aaagagattcctaaagatgcaacgattatatggtatgattttgaaaatgccactgatgaa 
gaaaatgagtatttaaaaaatcattttgatttcaattacttagaaatagatgatgctatc 
aatggtgacccacgagttaaatatatagaatatgacgcgtatcaatatatgatatttcat 
agtattattaatgatgattactcaccaatctcactaagtgtatttttagaaggtaatgtt 
ttagtgacataccatcacaaacattttccatcattaaagcgtgtggctcaatacaatgca 
20 gaaaatcatgatagtgaattagattgtgcagacatcgtcattcatattctggattgtatg 
gtggataaatattttaactttgtttatggtattgaagataaagtgtataat.tttgaagct 
aagcatgtcgatgaccgctatagtaagagcgttatggaaaatgtctttcaattacgttcg 
gatttaattaaaatcaaacgcgtattatttccgatgcaagaagttgtagatacaatgaaa 
caagaaggaaatataattaaagatgccaaacatagaatgtatattcaacatattgatgat 
25 catcttattaaacaaagaagtgttattcggacttctcaagaaatgacgaatgagattcgt 
gaaaattatgaatcattcacctcatttaggatgaatagtataatgcagatacttacgctt 
gtatctgttatattctcaccactcacttttattgctggtgtatatggaatgaactttgaa 
tttatgcctgagttgaaatggcattatgcttatttcgtgtgcttaactttaatgctaatt 
ataacaataatattaatcatattctttaaaaagaaaaaatggttttaa 

30 

Sequence 2100 

MTVTVRFQSLSQPLTLVSNVKEIPKDATIIWYDFENATDEENEYLKNHFDFNYLEIDDAI 
NGDPRVKYIEYDAYQYMIFHSIINDDYSPISLSVFLEGNVLVTYHHKHFPSLKRVAQYNA 
ENHDSELDCADIVIHILDCMVDKYFNFVYGIEDKVYNFEAKHVDDRYSKSVMENVFQLRS 
35 DLIKIKRVLFPMQEVVDTMKQEGNIIKDAKHRMYIQHIDDHLIKQRSVIRTSQEMTNEIR 
ENYESFTSFRMNSIMQILTLVSVIFSPLTFIAGVYGMNFEFMPELKWHYAYFVCLTLMLI 
ITIILIIFFKKKKWF* 

Sequence 2101 
40 Cont i g_0 7 1 0_pos_l 2 0 6_1 7 6 3 , 

is similar to (with p-value 2.0e-29) 

>gp:gp|AF016485|AF016485_60 Halobacterium sp. NRC-1 plasmid 
pNRClOO, complete plasmid sequence. NID: g2822278. >gp:gp|AF 
016485|AF016485_149 Halobacterium sp. NRC-1 plasmid pNRClOO, 

45 complete plasmid sequence. NID: g2822278. 

atgtcacaaaaagatgcgctggtttcagattttgataaagtgagatttgttcatcattcc 
atccccagtattgatgttagtcaagtcgatatgacaagtcatactacgaaattcgatttg 
gcatatccaatctatataaatgcaatgactggtggaagtgattggacaaaacaaattaat 
gaaaaattagcaattgttgctagagaaactggaattgcaatggcggtgggatcaacacat 

50 gcagctttgcgcaatcctaatatgattgaaacatttagcattgtgcgtaaaacaaatccc 
aaaggaacaattttcagcaatgtgggtgccgatgtaccagtggataaagctctacaagcg 
gttgaattattagatgctcaagcgctacaaattcatgtgaactcacctcaagaattagtc 
atgcctgaagggaaccgtgaatttgcttcatggatgtcaaatattgaatctattgttaaa 
cgcgttgatgttccagttattattaaagaagttggtttcggaatgagtaaagaaacatta 

55 caagcgttatatgattaa 

Sequence 2102 

MSQKDALVSDFDKVRFVHHSIPSIDVSQVDMTSHTTKFDLAYPIYINAMTGGSDWTKQIN 
EKLAIVARETGI AMAVGSTHAALRNPNMI ETFS I VRKTNPKGT I FSNVGADVPVDKALQA 
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VELLDAQALQIHVNSPQELVMPEGNREFASWMSNIESIVKRVDVPVIIKEVGFGMSKETL 
QALYD* 

Sequence 2103 
5 Contig_0710_pos_1830_3305, 

is similar to (with p-value 4.0e~90) 

>gp:gp| AF006665 |AF006665_31 Bacillus subtilis 168 region at 
182 min containing the cge gene cluster. NID: g2529445. >gp: 
gp|AF015775|AF015775_7 Bacillus subtilis YodA (yodA) , YodB ( 

10 yodS) , YodC (yodC) , YodD (yodD), ABC-transporter (yodE) , per 
mease (yodF), proteinase (ctpA) , YodH (yodH), YodI (yodi), c 
arboxypeptidase (yodJ), purine nucleoside phosphorylase (dec 
D) , Yodli (yodL), YodM (yodM) , YodN {yodN), YodO (yodO), YodP 
(yodP) , acetylornitine deacetylase (argE) , butirate-acetoac 

15 state CoA transferase (yodR) , butyrate acetoacetate-CoA tran 
sferase (yodS) , YodT (yodT) , CgeE (cgeE) , CgeD (cgeD) , CgeC 
(cgeC), CgeA (cgeA) , CgeB (cgeB), YzxA (yzxA), UDP-glucose e 
pimerase (yodU) , YodV (yodV) , and YodW (yodW) genes, complet 
e cds; and YodZ (yodZ) gene, partial cds. NID: g2415383. >gp 

20 :gp| Z99114 I BSUB0011_121 Bacillus subtilis complete genome (s 
ection 11 of 21): from 2000171 to 2207900. NID: g2634230. 
atgaatgatcatcaaaaaaatcatgcaacatctcaagatgataacacaatgtcaacacca 
tctaagaatagcaagcatataaaaattaaattatggcatttcatactcgttattttgggt 
attattcttttaacatccatcattactgtagtatcaacaattttaattagccatcaaaaa 

25 agtggtttaaataaagaacaacgtgcaaatttaaaaaaaattgaatatgtctatcaaaca 
cttaataaagattattacaaaaagcaaagttctgataaattaactcaatctgccatagat 
ggtatggttaaagaacttaaagatccatattcagaatatatgactgctgaagaaacaaaa 
caatttaatgaaggtgtatcaggtgatttcgttggcataggtgctgaaatgcaaaagaaa 
aatgaacagataagtgttactagcccaatgaaggattcaccagcagaaaaagctggtatt 

30 caacctaaagatatcgtcacacaagtgaatcatcattcggtagtcggtaaaccacttgat 
caagttgttaaaatggtccgcggcaaaaaaggaacatatgttactttaactataaaacgt 
ggttcgcaagaaaaggatattaagattaaacgcgataccattcacgttaagagtgtagag 
tatgctgaagaaaggcaatgtaggcgtactaacaatcaataaattccaaagcaatacttct 
ggtgaactcaaatctgcaatcatcaaagcgcataagcaaggcatccgtcatatcatttta 

35 gatttgagaaataatccgggggggttattagatgaggcagtcaagatggctaacatcttt 
attgataagggaaatactgtcgttcaattagaaaaaggtaaggataaggaagaattaaaa 
acttctaatcaagcactaaaacaagcaaaagatatgaaagtatccatcttagttaatgag 
ggatcagctagtgcttcagaagtgtttacaggtgctatgaaagactatcataaagctaaa 
gtttacggttctaaaacatttggtaaaggtatcgttcagaccactcgtgaatttagtgat 

40 ggttcattaattaaatatacagagatgaaatggctaacgcctgatggccattatattcat 
ggtaaaggaattagaccagatgttagtatctcaacaccaaaataccaatcactcaatgtc 
attccagataacaaaacttatcatcaaggtgaaaaagataaaaatgttaaaacgatgaaa 
ataggtctaaaagctttaggttatccaattgataacgaaacaaacatatttgacgaacaa 
ttagaatctgctattaaaacatttcaacaagacaataatttaaaagttaatggcaatttt 

45 gataaaaaaacaaatgataaatttactgaaaaactagttgaaaaagcgaataaaaaagat 
actgttttaaacgatttactaaacaaactaaaataa 

Sequence 2104 

MNDHQKNHATSQDDNTMSTPSKNSKHIKIKLWHFILVILGIILLTSIITVVSTILISHQK 
50 SGLNKEQRANLKKIEYVYQTLNKDYYKKQSSDKLTQSAIDGMVKELKDPYSEYMTAEETK 
QFNEGVSGDFVGIGAEMQKKNEQISVTSPMKDSPAEKAGIQPKDIVTQVNHHSWGKPLD 
QV^'KMVRGKKGTYVTLTIKRGSQEKDIKIKRDTIHVKSVEYEKKGNVGVLTINKFC/'JNTS 
GELKSAilKAHKQGIRHIILDLRNNPGGLLDEAVKMANIFIDKGNTWQLEKGKDXEELK 
TSNQALKQAKDMKVSILVNEGSASASEVFTGAMKDYHKAKVYGSKTFGKGIVQTTREFSD 
55 GSLIKYTEMKWLTPDGHYIHGKGIRPDVSISTPKYQSLNVIPDNKTYHQGEKDKNVKTMK 
IGLKALGYPIDNETNIFDEQLESAIKTFQQDNNLKVNGNFDKKTNDKFTEKLVEKANKKD 
TVLNDLLNKLK* 

Sequence 2105 
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Cont ig_07 1 0_pos_4 8 1 2_3 682 , 

is similar to (with p-value 7.0e-51) 

>sp:sp|P534 34 I YRP2_LISMO HYPOTHETICAL 41.4 KD PROTEIN IN RPO 
D '. 'REGION (0RFA2). >gp : gp | 017284 I LMU17284_3 Listeria moaocy 
5 togenes major sigraa factor (rpoD) gene, partial cds, and dow 
nstream orfAl and orfA2 genes, complete cds. NID: g687597. 
atggttggtaatgtaggattattaattggtaatgataagttagatattacaggtattctg 
acaacactcgactgcaccgatgatgttgttaaccaagcaattgaacttaataccaatacc 
atcattgctcatcatccacttattttcaaaggagtaaaacgtatcgttgaagatggatat 

10 ggtagtataattcgtaaacttatccaaaataatatcaatcttatagcattacacactaat 
cttgatgtaaatcctaaaggtgtcaatcgaatgttagcggatcaaataggtttagagaac 
atatcaatgattaatacaaatagctcatattattacaaagttcaaacttttatacctaaa 
aattatattgaagatttcaaagacagtttaaacgaacttggattagctaaagaaggtaat 
tacgaatattgtttctttgaaagtgaaggtaaagggcaatttaaaccagtaggtgatgca 

15 agtccttatatagggaagttagatagtatcgaatatgttgatgaaataaaacttgagtt t 
atgataaaagacaatgaattagaaataactaaacgtgctattttagataatcacccatac 
gaaacaccagtttttgattttattaaaatgaacaaagaaagtgagtatggattagggatt 
attggacaattaaaccaaactatgactttagatgaattttctgaatatgccaaaaaacag 
ctcaatataccgagcgtacgatatacaggtcaacatgatagtccaattaagaaagtagct 

20 atcataggtggttcaggtataggatttgagtataaagctagccaacttggagcagatgtt 
tttgttactggtgatattaaacaccatgatgctttagatgctaaaatccaaaatgtaaat 
ttattagacatcaatcattatagtgagtatgttatgaaagaaggattaaaagaattatta 
gaaaaatggttatttaaatatgaaaatcaatttccaatatatgcttctgaaatcaacaca 
agtgt tt.iattttcgatgtcttcttcaaaatgctctttcgcgggatattcatcatcccat 

25 tggttgttgttgaattcttccataatctgtttagcctctttgactaattga 

Sequence 2106 

MVGNVGLLIGNDKLDITGILTTLDCTDDVVNQAIELNTNTIIAHHPLIFKGVKRIVEDGY 
GSIIRKLIQNNINLIALHTNLDVNPKGVNRMLADQIGLENISMINTNSSYYYKVQTFIPK 
30 NYIEDFKDSLNELGLAKEGNYEYCFFESEGKGQFKPVGDASPYIGKLDSIEYVDEIKLEF 
MIKDNELEITKRAILDNHPYETPVFDFIKMNKESEYGLGIIGQLNQTMTLDEFSEYAKKQ 
LNIPSVRYTGQHDSPIKKVAIIGGSGIGFEYKASQLGADVFVTGDIKHHDALDAKIQNVN 
LLDINHYSEYVMKEGLKELLEKWLFKYENQFPIYASEINTSVLFSMSSSKCSFAGYSSSH 
WLLLNSSIICLASLTN* 

35 

Sequence 2107 

Cont ig_07 1 l_pos_2 4 94_2 90 1 , 

putative peptide of unknown function 

atgaaattatctatttttcatgatggtcaattttttgtaggtgtcgttgaataccaagaa 
40 ggcttcattcataaatatctaaaagttacatttggcaatgaacctagcgatgaaacagtg 
ttacgattcataacttttaaacttattcctttattaaatcaaacacacggtaagaagaaa 
cctattcaaaagcataaaaagattaatccaaaacgtttacaacgtaaaatcgctaaagaa 
caaaaagagaccaatttaactacatttgctcaacaagcgattaaagaagaacaagaqttg 
aataagctaaagagtaaaaaacttcagcgattagaaaaagaacgacacagacaatacaaa 
45 agaatgttaaaaagaaaaaaagcacatgaaaagcacaaaggtcactaa 

Sequence 2108 

MKLSIFHDGQFFVGVVEYQEGFIHKYLKVTFGNEPSDETVLRFITFKLIPLLNQTHGKKK 
PIQKHKKINPKRLQRKIAKEQKETNLTTFAQQAIKEEQELNKLKSKKLQRLEKERHRQYK 
50 RMLKRKKAHEKHKGH* 

Sequence 2109 

Contig_0711_pos_3571_3053, 
putative peptide of unknown function 
55 atgttagaaacacatagattaaagctagtgaagcctaatttgagttatacagatgaactt 
tatcaattgcatacaaataaggtagctacaaagtatacacctaaaggtattcatcagaat 
aaagtagcaacccaagattttattaaaggatggatgaggcattgggatgaatatcaattt 
ggt tact teat tttaattatgagagataatcacgaagtagtggggatagcgggattt gag 
tatcgtacaattcatcaacaacagtttcttaatgcgtattatagaatctttccatcgtat 
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actggtgttggtttagcttttgagtcaatggaggagattgcccgtcatttaaaaaagcat 
gataccataacaccaaaattaattcgaacaaatcaatataatacaaattctattaaatta 
gcacaaaaactcggatataattatgatgctaactgggacgatgtaattaataaaggagat 
cgttgtttttttaacctacaagcgttggataataactaa 

5 

Sequence 2110 

MLETHRLKLVKPNLSYTDELYQLHTNKVATKYTPKGIHQNKVATQDFIKGWMRHWDEYQF 
GYFILIMRDNHEVVGIAGFEYRTIHQQQFLNAYYRIFPSYTGVGLAFESMEEIARHLKKH 
DTITPKLIRTNQYNTNSIKLAQKLGYNYDANWDDVINKGDRCFFNLQALDNN* 

10 

Sequence 2111 

Con t i g_07 1 l_pos_7 03_3 5 9 , 

putative peptide of unknown function 

atggagatgatagaagaacgtaatttatcagggcttattcaaacactaactttcaatcat 
15 cccat cat tcaaat tot taaagagaacacattaaatcaacttaaaa tact ctctcantat 
ttaci^ayagcgacaccctgcaatggtggcaattcaatcttggtcacaatggttta:tgat 
catgggattactgaaatccaccttgatgtaactgcacaagcgcctagatcttattacaaa 
ggtatttttataaaatgtcatcttaaaaatactgctcatagcgttttgacaggtggatat 
tatcacggttcactagaaggttttggtttaggattaacactttaa 

20 

Sequence 2112 

MEMIEERNLSGLIQTLTFNHPIIQILKENTLNQLKILSHYLPERHPAMVAIQSWSQWFTD 
HGITEIHLDVTAQAPRSYYKGIFIKCHLKNTAHSVLTGGYYHGSLEGFGLGLTL* 

25 Sequence 2113 

Contig_0711_pos_0_334, 

putative peptide of unknown function 

atgttacgagttgcattagcaaagggtcgtttattaaagagttttatcgaatatttacaa 
caagttaatcagatagatattgcaactgtacttttaaatagacagcgacagttattgctt 
30 acagtcgacaacattgaaatgattt tag ttaaaggaagcgatgtgcc tact tatgtagaa 
caaggtattgctgatgtaggaatagtgggaagtgatattctgaatggtcaaaaatataat 
attaataaattactcgatttgccatttggtaaatgtcattttgcgttggcggcaaagcca 
gaaacatctcgctataaaaaagtagcaaGTATTG 

35 Sequence 2114 

MLRVALAKGRLLKSFIEYLQQVNQIDIATVLLNRQRQLLLTVDNIEMILVKGSDVPTYVE 
QGIADVGIVGSDILNGQKYNINKLLDLPFGKCHFALAAKPETSRYKKVASIX 

Sequence 2115 
40 Con t i g_0 7 1 2_po s_2 9 9_9 2 8 , 

is similar to (with p-value l.Oe-52) 

>sp:sp|Q024 99iKPYK_BACST PYRUVATE KINASE (EC 2.7.1.40) (PK) . 

>pir:pir|S29783|S29783 pyruvate kinase (EC 2.7.1.40) (versi 
on 2) - Bacillus stearothermophilus >gp: gp | D13095 I BACPK_3 B. 
45 stearothermophilus phosphofructokinase and pyruvate kinase 
genes. NID: g285620. 

atgcctttttctaaacctataattgcgcttggtgaagtaataccattttcttctgtaatt 
agacctatagctttttcaacatatggtactaatgtttcatcaacagaatttgtaataata 
actttatcagataaatctttaccttctaaatcactagcactatctgcgacaattgcatgg 

50 cctacaacagatcctctaccaacaccttggcctttagcaatctcatcacctactaagtgg 
attttcatcatatttgtagttcctttttctccagtaggtacaccagcagtaataataatt 
aaatctccgtttgaaactctaccagtttctactgctgttgctacagcattatttagtaaa 
gcatcagttgttttacgtccttctttaacgaccggatttactccccatacaattgcacat 
tgtctagcagttttttcgctaggtgttacagcaatgatatctgaatgtggacgatattta 

55 gaeatcgtacgtgctgttgaaccactttcagttgctgctacaatagcttttacatl.taag 
tttaaggcagtatgtgcaacagaaacaccaattgcatttactaatgaagtttcaactaat 
ttagtacgatcacttaataattttttatag 

Sequence 2116 
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MPFSKPIIALGEVIPFSSVIRPIAFSTYGTNVSSTEFVIITLSDKSLPSKSLALSATIAW 
PTTDPLPTPWPLAISSPTKWIFIIFWPFSPVGTPAVIIIKSPFETLPVSTAVATALFSK 
ASVVLRPSLTTGFTPHTIAHCLAVFSLGVTAMISECGRYLEIVRAVEPLSVAATIAFTFK 
FKAVCATETPIAFTNEVSTNLVRSLNNFL* 

5 

Sequence 2117 

Con 1 1 9_07 1 2_pos_5 62 9_0 , 

is similar to (with p-value 2.0e-21) 

>gp:gp|Z99125iMLCL536_19 Mycobacterium leprae cosmid L536. N 
10 ID: g2398683. >gp : gp I U00013 I U00013_5 Mycobacterium leprae co 
smid B1496, NID: g466868. >gp: gp I Z99125 | MLCL536_19 Mycobacte 
rium leprae cosmid L536. NID: g2398683. 

atgaattttaataatttggatcaattatatagatctgtaattatggatcattacaaaaac 
cctagaaacaaaggtgtcttagacaatggctcaatgactgttgatatgaataaccctaca 

15 tgtggtgatcgcatacgtttgacatttgatattgaagacggaatcattaatgatgctaag 
tttgaaggagaaggatgttcaatttcaatgtctagtgcatctatgatgactgaagcagtt 
aaaggtcattcacttggtgaagcaatgcaaatgagccaagagtttactaaaatgatgctc 
ggtgaagactacgagattacagaagaaatgggagatattgaggcgcttcaaggtgtctca 
caattcccagctagaattaaatgtgcaacgcttgcatggaaagcattagaaaaagggaca 

20 gtcgaaaaagaaggtaagtcagaaggtaca 

Sequence 2118 

MNFNNLDQLYRSVIMDHYKNPRNKGVLDNGSMTVDMNNPTCGDRIRLTFDIEDGIINDAK 
FEGEGCSISMSSASMMTEAVKGHSLGEAMQMSQEFTKMMLGEDYEITEEMGDIEALQGVS 
25 QFPARIKCATLAWKALEKGTVEKEGKSEGT 

Seqa-nr;f> 2119 

Cont ig_0 7 1 2_po s_5 60 3 5 016, 

is similar to (with p-value 3.0e-27) 

30 >sp:sp| P164 68 |MAOX_BACST MALATE OXIDOREDUCTASE (NAD) (EC 1.1 
.1.38} (MALIC ENZYME). >pir : pir 1 A33307 | DEBSXS malate dehydro 
genase (oxaloacetate-decarboxylating) (EC 1.1.1.38) - Bacill 
us stearothermophilus >gp:gp! M194 85 ! BACMAL_1 B . stearothermop 
hilus malic acid gene, complete cds . NID: gl43164. 

35 atgcttgaactaattgatctacatcttctttcgtattataaacgtaaaaacttgcacgtg 
ctgttgaagagacacctaaccatttcatcaaaggttgcgcacagtgatggcctgcacgaa 
ctgcaacaccttctaggtgtatctgtagcggatatattaactcaagatatggtgaagaca 
atggctgatgatccaattatttttgctatggctaatcctaatcctgaaatcaatcccaat 
gaagccaaacaggcaggtgcaaaggtcgtaggtacaggtcgatctgattttccaaaccaa 

40 attaataatgtattagcatttccaggtatttttagaggtgcattggatgttgaagccact 
catattaacgaagatatgaaaaaggcagctgtagaagctatcgttcatttaatagacgaa 
aatgagttacatcctgattactgtataccaggaccatttgataaaagagtagctccatca 
gtagccaaaaatgtagctaaagctgctatggaatccggtgtagcaagaattaaaattgat 
acacaagaaatatttgataaaactatgaaacttactgacttaaaatag 

45 

Sequence 2120 

MLELIDLEILLSYYKRKNLHVLLKRHLTISSKVAHSDGLHELQHLLGVSVADILTQDMVKT 
MADDP 1 1 FAMANPN PE I N PNEAKQAGAKVVGTGRS DFPNQI NN VLAFPG I FRGALDVE AT 
HINEDMKKAAVEAIVHLIDENELHPDYCIPGPFDKRVAPSVAKNVAKAAMESGVARIKID 
50 TQEI FDKTMKLTDLK* 

Sequence 2121 
Contig_0712_pos_2893_1970, 
is similar to (with p-value O.Oe+00) 
55 >sp:sp|P00512|K6PF_BACST 6-PHOSPHOFRUCTOKINASE (EC 2.7.1.11) 
(PHOSPHOFROCTOKINASE) ( PHOSPHOHEXOKINASE) . >pir : pir I A27474 | 
KIBSFF 6-phosphof ructokinase (EC 2.7.1,11) - Bacillus stearo 
thermophilus >gp: gp|M15643 I BACPFK_1 B. stearothermophilus 6-p 
hosphofructo-l-kinase gene, complete cds. NID: gl43311. >gp: 
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gp!D1309S|BACPK_2 B. stearothermophilus phosphof ructokinase 
and pyruvate kinase genes. NID: g285620. 

atgaacgctgctgttcgtgcagtgactcggacagcaatttacaataatattgaagtttat 
ggtgtttatcaaggttaccaaggtttacttgatgatgatattcataagcttgaattgggt 
5 tcagtaggggatacaattcaacgaggaggaactttcctcttttccgcaaggtgccctcag 
ttcaaagaagaggatgtacgtaagaaagctattgagaatttacgtaagcgtggtatcgaa 
ggtttagttgttattggaggagatggcagctatagaggggcacaacgaattagtgaggaa 
tgtaaagaaattcaaacaattggtattcctggtacaattgataatgatattaatggtaca 
gattttacaattggttttgatactgcattaaacactattattgaatcagtcgataagatt 

10 agagatacggcatcaagtcacgcaagaacgtttattgttgaagttatggggcgtgattgt 
ggagatttagctttatgggctggattatctgtaggtgctgaaacgattgttttaccagaa 
gtcaatacagatattaaggatgtagctgaaaagattgaacagggtattaaaagagggaaa 
aaacattctatcgttatggttgcagaaggttgtatgagcggccaagaatgtgcagatgag 
ttaacgaagtatattaacattgatacacgagtttcagtgttaggtcacattcaacgtggc 

15 ggtagcccatctggtgctgatcgagtattagcttctcgacttggtggatatgctgttgaa 
ctattaaaacaaggcgagacagctaaaggtgttggcattaggaataatcaattaacctct 
acgccgtttgatgaaatttttgctgaaagtgatcgcaaatttaatagtcaaatgtatgaa 
ttagcaaaagaattatcaatttaa 

20 Sequence 2122 

MNAAVRAVTRTAIYNNIEVYGVYQGYQGLLDDDIHKLELGSVGDTIQRGGTFLFSARCPQ 
FKEEDVRKKAIENLRKRGIEGLVVIGGDGSYRGAQRISEECKEIQTIGIPGTIDNDINGT 
DFTIGFDTALNTIIESVDKIRDTASSHARTFIVEVMGRDCGDLALWAGLSVGAETIVLPE 

VNTDIKDVAEKIEQGIKRGKKHSIVMVAEGCMSGQECADELTKYINIDTRVSVLGHIQRG 
25 GSPSGADRVLASRLGGYAVELLKQGETAKGVGIRNNQLTSTPFDEIFAESDRKFNSQMYE 
LAKELSI* 

Sequence 2123 
Contig_0712_pos_l 94 6_18 9 , 

30 is similar to (with p-value O.Oe+00) 

>sp:sp|P51181|KPYK_BACLI PYRUVATE KINASE (EC 2.7.1.40) (PK) . 

>pir :pir IJC4220 I JC4220 pyruvate kinase (EC 2.7.1.40) - Baci 
llus licheniformis >gp : gp | D31 955 | BACPYK2_2 Bacillus lichenif 
orinis gene for pyruvate kinase, complete cds . NID: gl04109B. 

35 atgagaaagactaaaattgtatgtacaataggaccagcttcagaatcagaggaaatgctc 
gaaaaactaatgaatgcaggaatgaacgttgcgcgtttaaatttctcacatggtagtcat 
gaagaacataaagcaagaattgatacaattcgtaaagttgctaaacgtttaaataaaaca 
att 3';'::t t-gttattggatactaaagggccagaaattcgtacgcacaatatgaaag.itgga 
cttattgttttagaaaaaggcaaagaagtcattgtcagtatgaatgaagttgaaggaaca 

40 cctgaaaaattctctgtaacatatgaaaatctaatcaatgatgtcaatattggatcatat 
atactattagatgatggtttagttgaacttcaagtcaaagaaattaacaaagataaaggc 
gaagttaaatgtgatatcttaaatactggtgaattaaaaaataaaaaaggtgttaactta 
cctggtgttaaagttaatttacctggtatcactgataaagatgccgatgatatcagattt 
ggtataaaggaaaatgtagactttatagctgcaagttttgtaagacgtccaagtgatgtt 

45 ttagatatccgtcaaattcttgaagaagaaaaagcagaaataacaattttccctaaaatc 
gaaaaccaagaaggtatcgataatattgaagaaattcttgaagtatctgatggattaatg 
gtagcacgtggtgatatgggtgttgaaattccaccagaaagcgtaccaatggttcaaaaa 
gatttaattagaaaatgtaataaattaggaaaacctgtaattactgcgactcaaatgctt 
gattctatgcaacgtaatccacgtgcgacacgtgcagaagcaagtgacgtagctaatgca 

50 atatacgatggtactgacgctgtaatgttatcaggcgaaactgcagcaggtcaatatcct 
gaagaagctgttaaaactatgcgtaatattgcagtttctgctgaagcagcgcaagactat 
aaaaaattattaagtgatcgtactaaattagttgaaacttcattagtaaatgcaattggt 
gtttctgttgcacatactgccttaaacttaaatgtaaaagctattgtagcagcaactgaa 
agtggttcaacagcacgtacgatttctaaatatcgtccacattcagatatcattgctgta 

55 acacctagcgaaaaaactgctagacaatgtgcaattgtatggggagtaaatccggtcgtt 
aaagaaggacgtaaaacaactgat get tt acta aataat get gtagcaacagcagtagaa 
actggtagagtttcaaacggagatttaattattattactgctggtgtacctactggagaa 
aaaggaactacaaatatgatgaaaatccacttagtaggtgatgagattgctaaaggccaa 
ggtgttggtagaggatctgttgtaggccatgcaattgtcgcagatagtgctagtgattta 
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gaaggtaaagatttatctgataaagttattattacaaattctgttgatgaaacattagta 
ccatatgttgaaaaagctataggtctaattacagaagaaaatggtattacttcaccaagc 
gcaattataggtttagaaaaaggcatacctactgttgttggtgtagaacaagcaactaaa 
gaaattaaaaatgatatgttagtgactttagatgcgtcacaaggtaaagtgtttgaaggt 
5 tatgctaacgtcctttaa 

Sequence 2124 

MRKTKIVCTIGPASESEEMLEKLMNAGMNVARLNFSHGSHEEHKARIDTIRKVAKRLNKT 
IGLLLDTKGPEIRTHNMKDGLIVLEKGKEVIVSMNEVEGTPEKFSVTYENLINDVNIGSY 

10 ILl.DUGL'^^ELQVKEINKDKGEVKCDILNTGELKNKKGVNLPGVKVNLPGITDKDAnDIRF 
GIKENVDFIAASFVRRPSDVLDIRQILEEEKAEITIFPKIENQEGIDNIEEILEVSDGLM 
VARGDMGVEIPPESVPMVQKDLIRKCNKLGKPVITATQMLDSMQRNPRATRAEASDVANA 
lYDGTDAVMLSGETAAGQYPEEAVKTMRNIAVSAEAAQDYKKLLSDRTKLVETSLVNAIG 
VSVAHTALNLNVKAIVAATESGSTARTISKYRPHSDIIAVTPSEKTARQCAIVWGVNPW 

1 5 KEGRKTTDALLNNAVATAVETGRVSNGDLI I ITAGVPTGEKGTTNMMKIHLVGDEI AKGQ 
GVGRGSVVGHAIVADSASDLEGKDLSDKVIITNSVDETLVPyVEKAIGLITEENGITSPS 
AIIGLEKGIPTVVGVEQATKEIKNDMLVTLDASQGKVFEGYANVL* 

Sequence 2125 
20 Cont ig__0 714 _pos_l 8 1_7 8 3 , 

is similar to (with p-value 2.0e-33) 

>gp:gp|D50453|D50453_106 Bacillus subtilis DMA for 25-36 deg 
ree region containing the amyE-srfA region, complete cds . NX 
D: gl805369. 

25 atgcttccaaagctgtttacgttggtaaagatttatataaaaggtaagtcataccttcct 
ttcgtaaatcgatctgctgtagcaggtattttaacaactggtgtcatgcgcaccttattg 
tttttagctgtactaggtgttgttgtaactggcgttacgcttagttcagaaaatccacca 
gcatcagttttccaacatgcat taggtcctataggtaaaaatatttttggcgtagtaata 
tttgcagcagcaatgtcctcagtaattggttctgcatatacaagcgcaacatttttaaaa 

30 acactacrrjcaaatcgttactcaataaaaataatcttatcgttattacatttattgvaatt 
tcaacttttgttttcttatttattggtaaaccggtgagtttacttataatagctggtgcg 
attaatggttggattctaccaatcacattaggtgcaattctcattgcaagtaggaaaaaa 
tctatcgttggtaattaccaacacccaacatggatgcttgtttttggtattatagccgta 
attgtcacaataatgactggtatcttttcattacaagatttagcaagtctttggaaaggt 

35 taa 

Sequence 2125 

ML PKL FTL VKI Y I KGKS YLPFVN RSAVAG I LTTGVMRTLL FLAVLG WVTG VTLS S EN P P 
ASVFQHALGPIGKNIFGWIFAAAMSSVIGSAYTSATFLKTLHKSLLNBCNNLIVITFIVI 
40 STFVFLFIGKPVSLLIIAGAINGWILPITLGAILIASRKKSIVGNYQHPTWMLVFGIIAV 
I VT I MTG I FS LQDLASLWKG * 

Sequence 2127 

Contig_0714_pos_1712_2215, 
45 is similar to (with p-value 4.0e-46) 

>sp:sp| P54 4 52|YQEG_BACSU HYPOTHETICAL 20.1 KD PROTEIN IN NUC 

B-AROD INTERGENIC REGION. >gp : gp | D84 432 I BACJH64 2__91 Bacillus 
subtilis DNA, 283 Kb region containing skin element. NID: g 

2627063. >gp:gp| Z99117 |BSUB0014_48 Bacillus subtilis cor.;jlet 
50 e genome (section 14 of 21): from 2599451 to 2812870. NID: g 

2634966. 

atgccaaatgcatatgtgaaatcaatatttgaaattgatatagaaaaacttgccgatagt 
ggtgttaaaggtatcataactgatttagataatacacttgttggttgggatgttaaagaa 
cctactaagggtgttaaatcatggtttgctaaggctaaagatttaggaataactgtcaca 
55 attgtgtcaaataataataaaagtcgagtatcaagtttctcaagtaatttaggtgtagat 
tatatattcaaagcacgtaaaccgatggggaaagcctttaagatggctattaaaaaaatg 
aaaattcaaccgagagaaaccgttgttgtaggagatcaaat get tact gatgtgtttggt 
ggcaattgtaatggtttatatacaattatggtagtacctgttaaacggactgatggatta 
attacaaagtttaatcgattaattgaaagacgattattaaatcattttagaaaaaaaggt 
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tatattaaatgggaggaaaattga 

Sequence 2128 

MPNAYVKSIFEIDIEKLADSGVKGIITDLDNTLVGWDVKEPTKGVKSWFAKAKDLGITVT 
5 IVSNNNKSRVSSFSSNLGVDYIFKARKPMGKAFKMAIKKMKIQPRETVWGDQMLTDVFG 
GNCNGLYTIMVVPVKRTDGLITKFNRLIERRLLNHFRKKGYIKWEEN* 

Sequence 2129 

Contig_0714_pos_2822_3316, 
10 is similar to (with p-value 6.0e-39) 

>sp:sp|P54 4 53! YQEH^BACSU HYPOTHETICAL 41.0 KD PROTEIN IN NUC 

B-AROD INTERGENIC REGION. >gp: gp | D84432 | BACJH642_92 Bacillus 
subtilis DNA, 283 Kb region containing skin element. NID: g 

2627063. >gp:gpl Z99117 |BSUB0014_47 Bacillus subtilis complet 
15 e genome (section 14 of 21): from 2599451 to 2812870. NID: g 

2634966. 

atgatagatattccattagacgaaaaatcatttatgtttgatacaccaggtatcattcaa 
tcacatcaaatgacaaattatgtatatgaaaatgagttgaaaatcattatacctaaaaat 
gaaataaagcaacgtgtgtatcaacttaatgaaaaacagacattatttttcggaggattg 

20 gcacgcattgattatgtatctggtggtaaaagaccacttgtttgtttcttttcaaatgat 
ttaacitattcatagaactaaaaccgagaaagctaatgatttatggaaatcccaattaggc 
gcattgctttcaccgcctcaagatgcacaacaatttaatcttaatgatgtaaaagcagta 
agactggaaactggtaaaactaaacgtgacatcatgatatctggtttaggattcataact 
attgatgctggtgcaaaagtgatagttcgtgttccaaaacatgtagatgttattttaaga 

25 aattcaattctttaa 

Sequence 2130 

MI DI PLDEKS FMFDTPGI IQSHQMTN YVYENELKI 1 1 PKNEIKQRV YQLNEKQTLFFGGL 
ARIDYVSGGKRPLVCFFSNDLNIHRTKTEKANDLWKSQLGALLSPPQDAQQFNLNDVKAV 
30 RLETGKTKRDIMISGLGFITIDAGAKVIVRVPKHVDVILRNSIL* 

Sequence 2131 

Contig_0714_pos_67 4 0_7852, 

is similar to (with p-value O.Oe+00) 

35 >pir :pir I A55856( A55856 11m protein - Staphylococcus aureus 

gtgaggtacaacttattcaatgaaggtgaactgatgtatacactattacttatagcttt t 
actatgatagtcagtttaataattacacccattattattgtaatatcaaaaaaattagat 
ttagtagatcgtcctaatttcagaaaagtacatacgaaacctatctcagtgatgggagga 
acggtcattttattttctttcttaatagggatttggctcggacaccctattgaacgtgag 

40 gttaaaccgcttatattaggtgcaattacaatgtatatggttggattgattgatga r.att 
tacgacf-^aagaccttatttaaagttagcaggtcaaattgttgcagctttaattg^: cacg 
ttt tat ggaattacaatagactttatttcattgccaattggtccaacgat teat tttggc 
atattcagcattcctattacagtaatatggattgtagcaattaccaatgctattaatctt 
atcgacggacttgatggacttgcctcaggcgtctcagcattggcattaatgactattgga 

45 ttcatcgctattttacaagcgaacatatttattatcatgatttgctgtgtacttttaggg 
tctttacttggtttct tattctataactttcacccagcgaaaattttcctaggtgatagt 
ggtgcattaatgataggatttattatcggtttcttatccttactcggctttaagaatatc 
acatttattgcattattctttcctatagttatattagcggtgccatttattgatacatta 
tttgcaatgattcgtcgaatgaaaaaagggcaacatataatgcaagcggacaagtcacat 

50 ttacatcataaattacttgctttaggatatacgcatagacaaaccgttttacttatttat 
tcaatagcgattatgtttagtttatctagtgttatcctctatttatcccaaccgttgggt 
gcacttatgatgttcattctcattgtctttacgattgagttgatcgttgaatttactgga 
ttaatagatgataattatcgaccaatattaaatttaattacaaaaaaaggaaatggtaag 
caacatcattatgatgagcatcaccgttcataa 

55 

Sequence 2132 

VRYNLFNEGELMYTLLLIAFTMIVSLIITPIIIVISKKLDLVDRPNFRKVHTKPISVMGG 
TVILFSFLIGIWLGHPIEREVKPLILGAITMYMVGLIDDIYDLRPYLKLAGQIVAALIVT 
FYGITIDFISLPIGPTIHFGIFSIPITVIWIVAITNAINLIDGLDGLASGVSALALMTIG 
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FIAILQANIFIIMICCVLLGSLLGFLFYNFHPAKIFLGDSGALMIGFIIGFLSLLGFKNI 
TFIALFFPIVILAVPFIDTLFAMIRRMKKGQHIMQADKSHLHHKLLALGYTHRQTVLLIY 
SIAIMFSLSSVILYLSQPLGALMMFILIVFTIELIVEFTGLIDDNYRPILNLITKKGNGK 

QHHYDEHHRS* 

5 

Sequence 2133 

Con t i g_0 7 1 4_pos_8 4 3 1__7 9 1 3 , 

is similar to (with p-value 7,0e-41) 

>gp:gp| Z99122 |BSUB0019_48 Bacillus subtilis complete genome 
10 (section 19 of 21): from 3597091 to 3809700. NID: g2636029. 
>gp:gp| U56901 1 BStJ56901_2 Bacillus subtilis putative transcri 
ptional regulator (yvhJ) , Ycr59c/YigZ homolog (yvhK) , histid 
ine kinase (degS) , transcriptional regulator of degradation e 
nzyrae (degO) , (degV) , (comFA) , (comFB) , (comFC) , flagellar p 
15 rotein (yviB), negative regulator of flagellin (flgM)r flage 
liar protein (yviC) , f lagellar-hook associated protein 1 (fl 
gK) , f lagellar-hook associated protein 3 (flgL) , (yviE), tra 
nsmembrane protein (yviF) , (csrA) , flagellin (hag), flagella 
r protein (yviH), flagellar hook-associated protein 2 (fliD) 
20 , flagellar protein (fliS), flagellar protein (fliT) , sigma- 
54 modulator homolog (yvil), and (secA) genes, complete cds. 
NID: gl762326. 

gtgatccgttcgatacataaagatgcaactcataattgttcagcctatactgtcggacca 
gagatgaatattcaaaaggcaaacgacgatggcgaaccaagtggaacagctggcatccca 

25 atgcttgaaatactgaaaaaacaagagatacacaatgtttgtgtcgtcgtgacacgctac 
ttcggtggtatcaagttaggtgcaggcggtcttattagagcatatagcggcgccgtgcgt 
gatgtgatatatgatataggtagagtcgaactaagagaagctattccagtaaccgttacg 
ttagattatgatcagacaggtaaatttgaatatgaacttgcctctactacattcttatta 
agagaacaattttataccgataaagtaagttatcaaattgacgtagtaaaaaatgaatat 

30 gatgcttttatagactttttaaatcgaattacttctggaaattatgatttgaaacaagaa 
gaccttaaactattaccttttgatattgaaaccaattaa 

Sequence 2134 

VIRSIHKDATHNCSAYTVGPEMNIQKANDDGEPSGTAGIPMLEILKKQEIHNVCVWTRY 
35 FGGIKLGAGGLIRAYSGAVRDVIYDIGRVELREAIPVTVTLDYDQTGKFEYELASTTFLL 
REQFYTDKVSYQIDVVK^3EYDAFIDFLNRITSGNYDLKQEDLKLLPFDIETN* 

Sequence 2135 
Contig_0714_pos_64 33_5357, 

40 is similar to (with p-value 2,0e-25) 

>sp:sp|P54 595|YHCK_BACSU HYPOTHETICAL 40.7 KD PROTEIN IN CSP 
B-GLPP INTERGENIC REGION. >gp: gp | X96983 I BS75DGREG_12 B.subti 
lis chromosomal DNA (region 75 degrees: cspB upstream of glp 
PFKD operon) . NID: gl239975. >gp:gp I Z99108 I BSUB0005_180 Baci 

45 llus subtilis complete genome (section 5 of 21) : from 802821 
to 1011250. NID: g2633055. 
atggaaatgtttgaagctatcatatataacatatctgtcatggtggcaggtatatattta 
tttcataggttacaatattctgaaaataaaagaatgattttttctaaagaatatgtaaca 
gtactaatgacattcgtttctttacttttagcggcataccctatcccatttcaaaacgaa 

50 tacctcgtccatttaacatttgtacctcttttgtttttaggacgttataccaacatgata 
tatacactcacggctgcttttatcgtatctttagtcgatgtatttatctttggaaactca 
att:a*^':tr. tggtattacattaatcgttattgcaggtattgtcagtgcagtgggacfiittc 
ttaaagcaaaacgatatcatttctttacttattttaaatttgattagcattatcaLtttg 
ttatttttagcattattaagccctatttatgaactcgtagagattttagtgcttatccct 

55 atttcatttattattacaattgcttcagcaataacattcgttgatatatggcactttttc 
tctttagtcaatcgttatgaaaatgaagataaatacgattatcttacaggtctaggtaat 
gtgaaagaatttgatagacacttaaatgaggtctcaagtaaagctgaagaaaagaaacaa 
agtttagccttacttctcattgatattgatggctttaaagatgtaaacgatcattattca 
caccaatcaggagatgctgttctcaaacaaatgtctcaactattaaaaaactatgtccca 
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aaccagttcaaaatatttagaaacggtggcgaagaattttctgttgtaataagagattac 
acactagatcaaagcgtgaaattagcagaaaatattcgaagtggtgttgaaaaatcttct 
ttccacctaccaaacaaagaagtaatcaagctatcagtttcaattggtgtaggatactta 
acvcsaqaagatcgtaaatctcaacgtaaagtatttaaagatgctgatgacatgg'iacat 
5 gtggctaaaagtgaaggaagaaataaagtcatgtttaatcctattgtcaaattataa 

Sequence 2136 

MEMFEAIIYNISVMVAGIYLFHRLQYSENKRMIFSKEYVTVLMTFVSLLLAAYPIPFQNE 
YLVHLTFVPLLFLGRYTNMIYTLTAAFIVSLVDVFIFGNSIIYGITLIVIAGIVSAVGPF 
10 LKQNDIISLLILNLISIIILLFLALLSPIYELVEILVLIPISFIITIASAITFVDIWHFF 
SLVNRYENEDKYDYLTGLGNVKEFDRHLNEVSSKAEEKKQSLALLLIDIDGFKDVNDHYS 
HQSGDAVLKQMSQLLKNYVPNQFKIFRNGGEEFSWIRDYTLDQSVKLAENIRSGVEKSS 
FHLPNKEVIKLSVSIGVGYLTQEDRKSQRKVFKDADDMVHVAKSEGRNKVMFNPIVKL* 

15 Sequence 2137 

Contig_0714_pos_5051_4 662, 

putative peptide of unknown function 

atgaatcgtattgcccatagttatggtttacatgatacatacagttttgtgacatcaact 
gcaattattttctcattaaatgatcgtactagtacgaggttgattcgtattcgcgaacgt 
20 acaaccgatcttgagaaaattgctttaaccaatagcctatctcgtaaaatttcgagtaag 
caacttacaattgacgaagcaaaaagtgagttactgcaacttaaacgtgcgtctcttcag 
tattctttcttaacaaatctcattgctgcctttgtagcttgtggttttttcttattcatg 
tttggtggcgtagcttccgacgcttggattgcatgcctagcgggtggcatagctttttta 
acgtttagtttcgtgcacttttctatctaa 

25 

Sequence 2138 

MNRIAHSYGLHDTYSFVTSTAIIFSLNDRTSTRLIRIRERTTDLEKIALTNSLSRKISSK 
QLTIDEAKSELLQLKRASLQYSFLTNLIAAFVACGFFLFMFGGVASDAWIACLAGGIAFL 
TFSFVHFSI* 

30 

Sequence 2139 

Contig_0714_pos__2877_2557, 

is similar to (with p-value 5.0e-30) 

>sp:sp| P54 4 53I YQEH_BACSU HYPOTHETICAL 41.0 KD PROTEIN IN NUC 
35 B-AROD INTERGENIC REGION. >gp: gp | D8 4 432 | BAG JH642_92 Bacillus 
subtilis DNA, 283 Kb region containing skin element. NID: g 
2627063. >gp:gp|Z99117 |BSUBO014_47 Bacillus subtilis complet 
e genome (section 14 of 21): from 2599451 to 2812870. NID: g 
2634966. 

40 atgatacctggtgtatcaaacataaatgatttttcgtctaatggaatatctatcatatct 
aaagttgttcctggaaagcgtgatgtagttactacatctttttctcccacactctgttca 
attaatttattaattaatgtagattttccaacattcgttgtacctacaatgtatacqtca 
tctttatttcttacatggtttatagattgcaataattcatcaatcccccaacctttottt 
gcagaaataagaacgacatcttctgcttctaatccatatttacgagcagattttctcaac 

45 cattcttttacacgtcgatga 

Sequence 214 0 

MIPGVSNINDFSSNGISIISKVVPGKRDVVTTSFSPTLCSINLLINVDFPTFVVPTMYTS 
SLFLTWFIDCNNSSIPQPLFAEIRTTSSASNPYLRADFLNHSFTRR* 

50 

Sequence 2141 

Contig_0714_pos_64 4_213, 

is similar to (with p-value l.Oe-21) 

>gp:gp! D50453 I D50453_106 Bacillus subtilis DNA for 25-36 deg 
55 ree region containing the amyE-srfA region, complete cds. NI 
D: gl805369. 

atgagaattgcacctaatgtgattggtagaatccaaccattaatcgcaccagctattata 
agtaaactcaccggtttaccaataaataagaaaacaaaagttgaaattacaataaatgta 
ataacgataagattatttttattgagtaacgatttgtgtagtgtttttaaaaatgttgcg 
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cttgtatatgcagaaccaattactgaggacattgctgctgcaaatattactacgccaaaa 
atatttttacctataggacctaatgcatgttggaaaactgatgctggtggattttctgaa 
ctaagcgtaacgccagttacaacaacacctagtacagctaaaaacaataaggtgcgcatg 
acaccagttgttaaaatacctgctacagcagatcgatttacgaaaggaaggtatgactta 
5 ccttttatataa 

Sequence 2142 

MRIAPNVIGRIQPLIAPAIISKLTGLPINKKTKVEITINVITIRLFLLSNDLCSVFKNVA 
LVYAEPITEDIAAANITTPKIFLPIGPNACWKTDAGGFSELSVTPVTTTPSTAKNNKVRM 
! 0 T PVVKI PAT A DR FTKGRYDLPFI* 

Sequence 2143 

Contig_0715_pos_1038_1508, 

is similar to (with p-value 4.0e-35) 

15 >sp:«^pl002134 |HIS7_LACLA IMIDA20LEGLYCER0L- PHOSPHATE DEr.YDRA 
TASE (EC 4.2.1.19) (IGPD) . >pir :pir I G45734 | G45734 HisB • Lac 
tococcus lactis subsp. lactis >gp : gp | U92974 | LLU92974_6 Lacto 
coccus lactis unknown gene, partial cds, and HisC (hisC) , un 
known, HisG (hisG) , unknown, HisB (hisB) , unknown, HisH (his 

20 h), HisA (hisA), HisF (hisF) , HisIE (hisIE) , unknown, unknow 
n, LeuA (leuA) , LeuB (leuB), LeuC (leuC), LeuD (leuD), unkno 
wn, IlvD (ilvD), IlvB (ilvB), IlvN, IlvC (ilvC) , IlvA (ilvA) 
, Aids (aldB) and aldR (aldR) genes, complete cds. NID: g256 
5137. 

25 atgttaacgctatttacttttcatagtggattaactttatctattgaggccactggagat 
acgtatgttgatgatcatcatataactgaagatataggtatagttattggacaattactt 
cttgaattaataaagactcaacaaagttttacaagatatggttgctcatatgtacccatg 
gatgaggcgcttgctcgaacagtagtggacattagtggtcgtccatatttctcatttaat 
agcaagttgagcgctcaaaaggtaggaacttttgacactgaactagttgaagaatttttt 

30 agagcattgataattaatgcgcgattaaccgttcacattgacttattaagaggtggaaat 
acacatcatgagattgaggcaatatttaaatcttttgcaagagcattaaagatttctctt 
gcacaaaatgaagatggacgtattccatcgtctaaaggagtaattgaatga 

Sequence 214 4 

35 MLTLFTFi:SGLTLSIEATGDTYVDDHHITEDIGIVIGQLLLELIKTQQSFTRYGC5.;YVPM 
DEALARTVVDISGRPYFSFNSKLSAQKVGTFDTELVEEFFRALIINARLTVHIDLLRGGN 
THHEIEAIFKSFARALKISLAQNEDGRIPSSKGVIE* 

Sequence 2145 
40 Contig_0715_pos_1607_1990, 

is similar to (with p-value l.Oe-21) 

>sp:sp|Q02132|HIS5_LACLA AMIDOTRANSFERASE HISH (EC 2.4.2.-). 

>pir : pir I 145734 1 145734 HisH - Lactococcus lactis subsp. lac 
tis >gp:gp|U92974 |LLU92974_8 Lactococcus lactis unknown gene 

45 / partial cds, and HisC (hisC) , unknown, HisG (hisG) , unknow 
n, HisB (hisB) , unknown, HisH (hish) , HisA (hisA) , HisF (his 
F) / HisIE (hisIE) , unknown, unknown, LeuA (leuA) , LeuB (leuB 
), LeuC (leuC), LeuD (leuD), unknown, IlvD (ilvD), IlvB (ilv 
B) , IlvN, IlvC (ilvC), IlvA (ilvA) , AldB (aldB) and aldR (al 

50 dR) genes, complete cds. NID: g2565137. 

gtgcaaaaagctgaagctatcgtacttccaggtgttggacattttcaggatgcgatgcat 
tctatagaagaaaaaagcattaaagatatgcttaaaaatatacatgataaaccgataatt 
ggaatatgtttaggtatgcaattactt tttcaacatagcgcagaaggtgacgttagtgga 
ttggaacttgtcccgggaaatatagtgccaatccaatcatctcatcctattcctcar:ttg 

55- ggttggaatgaattaaagagtacacatcccttactgcaaagtgatgtgtattttgttcat 
tcatatcaagcagaaatgtcagaatatgtagtagcttatgctgactatggtacaaagatt 
ccgggagtcattcaataccgatga 

Sequence 2146 
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VQKAEAI VLPGVGH FQDAMHS I EEKS I KDMLKNI H DKPI IG ICLGMQLLFQHSAEGDVSG 
LELVPGNIVPIQSSHPIPHLGWNELKSTHPLLQSDVyFVHSYQAEMSEYVVAYADYGTKI 
PGVIQYR* 

5 Sequence 2147 

Con t i g_0 7 1 5_pos_2 0 67_2 615, 

is si:a.ilar to (with p-value 9.0e-31) 

>sp:sp|P54471|YQFN_BACSU HYPOTHETICAL 23.7 KD PROTEIN IN CCC 
A-SODA INTERGENIC REGION. >gp: gp I D84432 I BACJH642_139 Bacillu 
10 s subtilis DNA, 283 Kb region containing skin element. NID: 
g2627063. >gp: gp 1 Z99116 j BSUB0013_228 Bacillus subtilis compl 
ete genome (section 13 of 21): from 2395261 to 2613730. NID: 
g2634723. 

gtgattcaaggaccttataaggctgctaaaagaaatattgcaaattatgaattaaatcaa 
15 caggttgatgtacgtctaggcgatggtctaagcgttataaacacagaagaccaaattgat 
aatataactgtttgtggtatgggagggccattaattgcaaaaatattaaacgatggaaaa 
gat aaattagttaaccatccaagactca tact acaaagcaa cat acaaactcaagcatta 
agacaaactcttaataaactttcatatgaaatcgttgatgaaagaatcattgaggaaaag 
ggtcacatatatgaaatcgtggtagctgagtttaataataacttagttaaattaaatata 
20 ttacaagaaaaattcggaccatttttacttagagaatgtaataacatttttcaaaaaaaa 
tggcaaagagagttagaagcactgcgtgatataaaatcccaattgaattcaacatcacat 
catgagagactaaaagaaatagaagatgaaattaacttaatacaagaggtgttaattaat 
gaaaattag 

25 Sequence 214 8 

VIQGPYKAAKRNIANYELNQQVDVRLGDGLSVINTEDQIDNITVCGMGGPLI7VKILMDGK 
DKLVFHFHLILQSNIQTQALRQTLNKLSYEIVDERIIEEKGHIYEIVVAEFNNNLVKLNI 
LQEKFGPFLLRECNNIFQKKWQRELEALRDIKSQLNSTSHHERLKEIEDEINLIQEVLIN 
EN* 

30 

Sequence 2149 

Cont ig_07 1 5_pos_2 62 3_37 05 , 

is similar to (with p-value. 2 . Oe-54 ) 

>sp:sp| P534 34 I YRP2_LISM0 HYPOTHETICAL 41.4 KD PROTEIN IN RPO 

35 D 3'REGION (0RFA2) . >gp:gp| U17284 |LMU17284_3 Listeria monocy 
togenes major sigma factor (rpoD) gene, partial cds, and dow 
nstream orfAl and orfA2 genes, complete cds. NID: g687597. 
atggaagttttaaataatcacgttccatttcatcaagctgaatcatgggataatgtagga 
ttattaattggtaatgataagttagatattacaggtattctgacaacactcgactgcacc 

40 gatgatgttgttaaccaagcaattgaacttaataccaataccatcattgctcatcatcca 
cttattttcaaaggagtaaaacgtatcgttgaagatggatatggtagtataattcgtaaa 
cttatccaaaataatatcaatcttatagcattacacactaatcttgatgtaaatcctaaa 
ggtgtcaatcgaatgttagcggatcaaataggtttagagaacatatcaatgattaataca 
aatagctcatattattacaaagttcaaacttttatacctaaaaattatattgaagatttc 

45 aaagacagtttaaacgaacttggattagctaaagaaggtaattacgaatattgtttcttt 
gaaagtgaaggtaaagggcaatttaaaccagtaggtgatgcaagtccttatatagggaag 
ttagatagtatcgaatatgttgatgaaataaaacttgagtttatgataaaagacaatgaa 
ttagaaataactaaacgtgctattttagataatcacccatacgaaacaccagtttttgat 
tttattaaaatgaacaaagaaagtgagtatggattagggattattggacaattaaaccaa 

50 actatgactttagatgaattttctgaatatgccaaaaaacagctcaat^taccgagcgta 
cgatatacaggtcaacatgatagtccaattaagaaagtagctatcataggtggttcaggt 
ataggatttgagtataaagctagccaacttggagcagatgtttttgttactggtgatatt 
aaacaccatgatgctttagatgctaaaatccaaaatgtaaatttattagacatcaatcat 
tatagtgagtatgt tatgaaagaaggattaaaagaattattagaaaaatggttatttaaa 

55 tatgaaaatcaatttccaatatatgcttctgaaatcaacacagatccatttaaatataaa 
taa 

Sequence 2150 

MEVLNNHVPFHQAESWDNVGLLIGNDKLDITGILTTLDCTDDVVNQAIELNTNTIIAHHP 
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LIFKGVKRIVEDGYGSIIRKLIQNNINLIALHTNLDVNPKGVNRMLADQIGLENISMINT 

NSSYYYKVQTFIPKNYIEDFKDSLNELGLAKEGNYEYCFFESEGKGQFKPVGDASPYIGK 

LDSIEYVDEIKLEFMIKDNELEITKRAILDNHPYETPVFDFIKMNKESEYGLGIIGQLNQ 

TMTLDEFSEYAKKQLNIPSVRYTGQHDSPIKKVAIIGGSGIGFEYKASQLGADVFVTGDI 

5 KHHDALDAKIQNVNLLDINHYSEYVMKEGLKELLEKWLFKYENQFPIYASEINTDPFKYK 
* 

Sequence 2151 

Cont ig_07 1 5_pos_37 4 0_4 07 2 , 

10 putative peptide of unknown function 

atgtcaaaacatccatttgaacactttaatttagatgagaatttaattgaagctgttaaa 
aatctcaattttgaaaaaccgactgaaatccaaaatagaatcataccgagaattcttaaa 
ggaacaaatttaataggacaatctcaaactggaactggaaagtcacacgcttttctttta 
atattgatggagatcgttctttttctgaagcagtacaattggacatacaagtacctggtg 

15 cttttcctaaatttggaactacagatggaacgattacgaaaaatgttgtcattgaaaaat 
gttattttggctgttcagatcatcctaaaatga 

Seua^nc^; 2152 

MSKHPFEKFNLDENLIEAVKNLNFEKPTEIQNRIIPRILKGTNLIGQSQTGTGKSHAFLL 
20 ILMEIVLFLKQYNWTYKYLVLFLNLELQMERLRKMLSLKNVILAVQIILK* 

Sequence 2153 

Contig_0717_pos_4083_5096, 

putative peptide of unknown function 

25 atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 
gcttgtttaggaccgacgcttaaacaaacagacaacttacctatacatgagttaatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaa t tt cagatccttgaattactccatcaaacattccctggtttagaaagatt at ttagt 

30 agtcgatattcaatcattgcactcaacatcgcagaaatcttt^ctcatccagacatggtt 
cttgatatcgacaaggaggtactgattacacatatattcaattctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagctat 
cctaatgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 
aaacaatctattcatcatatcaaacaattagatgatgccatgattcaattagcacaacaa 

35 ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 
attattgggaagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcvty.jtsataaaaaagcgagaaaacttttattttgggtgattatgaatataaraaga 
gggcagca teat tat gacaatcatgtcgtcgat tat tact acaaactaagaaagcagcct 

40 aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 
cattatcttgtaatgaattataaattgtacgattatcaaatgtcaccacattag 

Sequence 2154 

MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDNLPIHELIF 
45 FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERIiFSSRYSIIALNIAEIFTHPDMV 
LDIDKEVLITHIFNSTDKGMSMDKATKYALQLRVIAQESYPNVDRHSFLVEKLRLLIQQL 
KQSIHHIKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMIIGKIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKTAI I AC I NRLLKT I H YLVMN YKL YDYQMS PH * 

50 

Sequence 2155 

Con t ig_0 7 1 7_pos_8 6 93_0 , 

is similar to (with p-value l.Oe-49) 

>sp:sp|P4 24 23|YXDL_BACSU HYPOTHETICAL ABC TRANSPORTER ATP-BI 
55 NDING PROTEIN IN IDH 3*REGI0N. >gp: gp | D14 399 | BACIOLO_1 3 Baci 
llus subtilis 15 kb chromosome segment contains the ioi oper 
on. NID: g709980. >gp: gp | Z99124 | BSUB0021_68 Bacillus suhtili 
s compjete genome (section 21 of 21): from 3999281 to 4:.U481 
4. NID: g2636442, >gp: gp | D45912 | D45912_2 Bacillus subtilis g 
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enome sequence between the iol and hut operoHr partial and c 
omplete cds. NID: gl408482. 

atgggtccttctggatcaggtaaaacgactttactcaatgtgttaagttcaatagatact 
atttcagaaggaactgtggaagttgaaggcaaagaaattaataaactgagccacaaagaa 

5 gtggcaaattttcgaaaacaacatctcggttttatttttcaagattatagcgttttaccc 
acattaacagtaaaagaaaatattatgctaccactctcagtacaaaaattccataaatat 
gaaatggaacaaaattataaagaagtggctgaggcattaggtatttataacctgggaaat 
aaatatccaagtgaaatttctggcggtcagcaacaacgtacggcggcagcccgggcattc 
gtrcataaaccaacgattatt ttcgcagatgaacctactggcgca ttagattctaaaagt 

10 gctcaagatttgttacaccgtctagaagatatgaataaacaatttaattcaaccattatg 
atggtgacacatgatccttcagccgctagttacgctgagagagtcattatgttgaaagac 
ggtgatatacactcagaaatctaccagggtaacgattcaaaacaaacattttaccaagaa 
attatgaaacttcc 

15 Sequence 2156 

MGPSGSGKTTLLNVLSSIDTISEGTVEVEGKEINKLSHKEVANFRKQHLGFIFQDYSVLP 
TLTVKENIMLPLSVQKFHKYEMEQNYKEVAEALGIYNLGNKYPSEISGGQQQRTAAARAF 
VHKPTIIFADEPTGALDSKSAQDLLHRLEDMNKQFNSTIMMVTHDPSAASYAERVIMLKD 
GDIHSEIYQGNDSKQTFYQEIMKLP 

20 

Sequence 2157 

Con t ig_0 7 1 7_pos_8 3 0 9_8 007, 

putative peptide of unknown function 

atggagaaaaggagtattaatatgaaaaaagtatttatgatcataagtatacttaccata 
25 actgttactttaagtgcatgtggaggttctggaaaacaaaaagagccatctaaggaaagt 
caaaaatctgataaatatgattatgtttattatgaaatattaaatgatggagattctgaa 
acgccaaatgttgagattaaatataaagataaaaaaggtaaatcacatatagaaaaagct 
gatttagatcacgtgtatgaacatatactaggtgatggtaataaaaaaccatatatgata 
tga 

30 

Sequence 2158 

MEKRSINMKKVFMIISILTITVTLSACGGSGKQKEPSKESQKSDKYDYVYYEILNDGDSE 
TPNVEIKYKDKKGKSHIEKADLDHVYEHILGDGNKKPYMI* 

35 Sequence 2159 

Contig_0717_pos_3267_2203, 

is similar to (with p-value O.Oe+00) 

>sp:sp|P38021|OAT_BACSU ORNITHINE AMINOTRANSFERASE (EC 2.6.1 
.13) (0RNITHINE--OXO-ACID AMINOTRANSFERASE). >pir : pir | B5337 0 

40 IB53370 orthinine aminotransferase - Bacillus subtilis >gp:g 
p|D78193|BACGNTZA_27 Bacillus subtilis 36kb sequence between 
gntZ and trnY genes encoding 34 ORFs. NID: gl064780. >gp;gp 
1X81802 1 BSR0CDEF_1 B. subtilis rocD, rocE and rocF genes. NID 
: g550310. >gp: gp I Z99124 I BSUB0021_139 Bacillus subtilis comp 

45 lete genome (section 21 of 21): from 3999281 to 4214814. NID 
: g2636442. 

atggatatgctttcggcctactcggcagtgaatcaaggtcatcgacacccaagaattatt 
caagcattgaaagatcaagcagataaagtcactttagtatcacgtgcttttcatagtgat 
aatc-:gggtcaatggtatgagaaaatatgtaaactcgcaggtaaagacaaagcatt gcct 

50 atgaatacgggagcagaggcggttgaaacagctttaaaagctgctcgtcgttgggcttat 
gatgttaagggtattgagccgaacaaagctgaaattatcgcttttaatggtaatttccat 
ggacgtacgatggcaccagta teat tgtcttcagaagctgagtatcaacgaggctatggt 
ccattgt tagatggctttcgaaaagttgagtttggtgacgttaatcaattaaaagcagca 
attaataaaaatacagcagcaattttagtagaacctatacagggagaagcagggattaac 

55 gtaccaccagaaggatatttgaaaacaattagagaattatgtgatgaacatcaaatttta 
tttattgctgatgaaattcaagcaggattaggacgttcaggaaaattatttgcaacggat 
tgggatcatgtaaaaccggatgtttatattttaggaaaagcgttaggtggaggggtattt 
cctatctcggtagttcttgcagataatgaggtattagatgtatttactcctggctcacat 
ggttctacatttggtggaaatccactagcgagtgcagtttctattgcagctatagatgtc 
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at:a:..tQv^.cgaggatttacctggtcgttcattagaattaggagaatattttaagt*:tgaa 
ttgaaaaaaattgagcatccatctattaaagaagttaggggacgaggattatttatcggt 
attgaattacatgaaagtgccagaccatattgtgaagctttgaaagaacaaggattatta 
tgtaaagaaactcacgacaccgttattagatttgcacctccattagtgataacgaaagaa 
5 gagttagacatggctttagagaagattaagagtgtatttgcatag 



Sequence 2160 

- MDMLSAYSAVNQGHRHPRIIQTUiKDQADKVTLVSRAFHSDNLGQWYEKICKLAGKDKALP 
MNTGAEAVETALKAARRWAYDVKGIEPNKAEIIAFNGNFHGRTMAPVSLSSEAEYQRGYG 
10 PLLDGFRKVEFGDVNQLKAAINKNTAAILVEPIQGEAGINVPPEGYLKTIRELCDEHQIL 
FIADEIQAGLGRSGKLFATDWDHVKPDVYILGKALGGGVFPISWLADNEVLDVFTPGSH 
GSTFGGNPLASAVSIAAIDVIIDEDLPGRSLELGEYFKSELKKIEHPSIKEVRGRGLFIG 
IELHESARPYCEALKEQGLLCKETHDTVIRFAPPLVITKEELD^4ALEKIKSVFA* 

15 Sequence 2161 

Con t ig_0 7 1 7_pos_l 97 4 _8 5 0 , 

is similar to (with p-value O.Oe+OO) 

>sp:SplP50735|YPCA_BACSU HYPOTHETICAL 46.7.KD OXIDOREDUCTASE 
IN RECQ-CMK INTERGENIC REGION (EC 1,4.1.-). >gp: gp| Z99115 | B 

20 SUE0012_236 Bacillus subtilis complete genome (section 32 of 
21): XVom 2195541 to 2409220. NID: g2634478. >gp : gp I Z99116 I 
BSUB0013_8 Bacillus subtilis complete genome (section 13 of 
21): from 2395261 to 2613730. NID: g2634723. >gp : gp I L4764 8 | B 
ACSERA_11 Bacillus subtilis phosphoglycerate dehydrogenase ( 

25 serA) , ypaA, ferredoxin (fer), ypbB, recS, ypbD, ypbE, ypbF, 
ypbG, ypbHr glutaraate dehydrogenase (ypcA) , ypdA, ypdB, ypd 
C, spore cortex lytic enzyme (sleB) , ypeB, ypfA, ypfS, cytid 
ine monophosphate kinase (cmk) , ypf D, ypgA, yphA, yphB, yphC 
, NAD+ dependent glycerol-3-phosphate dehydrogenase (glyc) , 

30 yphE and yphF genes, complete cds . NID: glI46195. >gp:gp|L47 
64 8 I BACSERA_11 Bacillus subtilis phosphoglycerate dehydrogen 
ase (serA) , ypaA, ferredoxin (fer), ypbB, recS, ypbD, ypbE, 
ypbF, ypbG, ypbH, glutamate dehydrogenase (ypcA) , ypdA, ypdB 
, ypdC, spore cortex lytic enzyme (sleB), ypeS, ypfA, ypfB, 

35 cytidine monophosphate kinase (cmk), ypfD, ypgA, yphA, yphB, 
yphC, NAD+ dependent glycerol-3-phosphate dehydrogenase (gl 
Y^) t yphE and yphF genes, complete cds . NID: gll46195- 
gtgcgcattccagtacgaatggacgatgggacagttaaaacatttactggataccgtgca 
caacataatgatgcagttggccctactaaaggtggcgttcgtttccatccagaagttgat 

40 gaagaagaagtgaaggcattatcaatgtggatgactttaaaatgtggcattgttaF.^jtta 
ccttatcutgggggtaaaggcggtattgtttgtgatccacgtcaaatgagtatac^'cgaa 
gttgagcgtctatctcgagggtatgtaagagctatatctcaatttgttgggcctaacaaa 
gatattccagcaccagacgtttttacaaactctcaaattatggcttggatgatggatgag 
tatagtgcattagataaatttaattcgccaggttttataacaggcaaacctattgtatta 

45 ggtggatctcaaggacgtgatcgctctacagctttaggtgtagtcattgcaattgaacaa 
gcggcaaaacgtcgtggcatggatattaaagatgccaaaat tgtgattcaaggt ttcggc 
aatgcaggtagtttcttagctaaattcttatacgatttaggtgctaaagtagttggtata 
agtgatgcatatggagccttacatgaccctaatggacttgatatagattatttattagat 
cgacgtgatagcttcggtacagttacaaatctattcgaagatacaatttctaacaaagaa 

50 ttattcgaattggattgtgacatccttgttcctgctgcaatttccaatcaaatcactgaa 
gataatgcgcacgatattaaagcaagcattgtagttgaagctgctaatggccctacgacg 
cctgaagcaacacgtattttaacagaaagagatatattactagtgccggatgtacttgca 
agtgcaggaggtgtgactgtatcttatttcgagtgggtacaaaataatcaaggttattat 
tggactgaagaagaagtcaatgacaaactacgtgagaagttagtaacagcatttgatacg 

55 atttatgaattgtcacaaaatagaaaaattgatatgagattagcagcatatatagtaggt 
attaaacgtactgctgaagcagcaagatatagaggttgggcataa 

Sequence 2162 

VRIPVRMDDGTVKTFTGYRAQHNDAVGPTKGGVRFHPEVDEEEVKALSMWMTLKCGIVNL 
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PYGGGKGGIVCDPRQMSIHEVERLSRGYVRAISQFVGPNKDIPAPDVFTNSQIMAWMMDE 
YSALDKFNSPGFITGKPIVLGGSQGRDRSTALGWIAIEQAAKRRGMDIKDAKIVIQGFG 

NAGSFLAKFLYDLGAKVVGISDAYGALHDPNGLDIDYLLDRRDSFGTVTNLFEDTISNKE 
LFELDCDILVPAAISNQITEDNAHDIKASIVVEAANGPTTPEATRILTERDILLVPDVLA 
5 SAGGVTVSYFEWVQNNQGYYWTEEEVNDKLREKLVTAFDTIYELSQNRKIDMRLAAYIVG 
IKRTAEAARYRGWA* 

Sequence 2163 
Contig_0718_pos_4 955_54 07, 

10 putative peptide of unknown function 

atgtatgatgaattagaggttaataaaagtcgttttaaaaactgtaattttaatgaaggt 
atttttaagaatatagaagcaatttgtaattgtaaatttacaacgtgcgggtttaataat 
tgtattttcgaagatgttcatttttacaaaaaccaatttaaagattcaacatttgtgaat 
acaccatttgatcaatccgtatttaatagtactttattccaaaatgcaatgttcgatagc 

15 aatctcattcgtagcgtaaaatggactgatatcatttttaaaaacgtttctttcaaaaat 
gtagaaattgaaggaacaacatttaaagatgtaaaattcaaaaattgtgagttcaaaaat 
gtaattattactaattcaactatgtcgcaaaagttaatgaatgaattacaaaaacaagat 
gttactttagaaaatatagacacttctatttaa 

20 Sequence 2164 

MYDELEVNKSRFKNCNFNEGI FKNI EAICNCKFTTCGFNNCI FEDVH FYKNQFKDSTFVN 
TPFDQSVFNSTLFQNAMFDSNLIRSVKWTDIIFKNVSFKNVEIEGTTFKDVKFKNCEFKN 
VIITNSTMSQKLMNELQKQDVTLENIDTSI* 

25 Sequence 2165 

Con t ig_0 7 1 8_po s_8 0 5 6_7 316, 

is similar to (with p-value 3.0e-34) 

>sp:sp|Q06174 |EST_BACST CARBOXYLESTERASE PRECURSOR (EC 3.1.1 
.1). >pir:pir IJC1374 I JC1374 carboxylesterase (EC 3.1.1.1) - 
30 Bacillus stearotherraophilus (strain IFO 12550) >gp:gp I D12681 
|BACPBH7_1 Bacillus stearothermophilus esterase gene. NIO: g 

216313.. 

gtgaatgtaaaaatgaaagttaaatcaccacaatcaatctacttaaaaggacatcytcaa 
caagctgtattgttattacattcttttacgggaactgtacgtgatgtaaaacatttagca 

35 caacagttgaatgaagagggatttacttgttacgtgcctagttatccaggccacggtttg 
ccacttaaggaatttacccaacacaatattaatgattggtgggaacaagttacagcagca 
tatcaatttttaagaaatgcaggatacagtagaattaatgtgacaggcgtatcattaggc 
ggattatttactttaaggttagctgaacattttgatttagaacgtatagctgtgatgtca 
gccccacataaaaagcgtgaaagcgagattgcgtggcgtcttgaaaggtatgggcatcga 

40 atgaatgaaattttgagtttaagcgaagaagagcgtcgtcaccaaatggaaaccatcttg 
tcttatgataaagaaattgaagtgtttcaaggtgtaattgatgaaattatggcttatctt 
gcaaatattacagtaccagtgaatattatgtatggcgaagaagatgacccattatatgct 
caaagtgcgcaatacatttatgataatgtaaatagtcaagataaagaactgctcaaattt 
gaaaaaagcggtcatcttatgacgtatggcgatcatgcatacagagtagaacaatctatt 

45 attcaatttttcagtaaataa 

Sequence 2166 

VNVKMKVKSPQSIYLKGHRQQAVLLLHSFTGTVRDVKHLAQQLNEEGFTCYVPSYPGHGL 
PLKEFTQHNINDWWEQVTAAYQFLRNAGYSRINVTGVSLGGLFTLRLAEHFDLERIAVMS 
50 APHKKRESEIAWRLERYGHRMNEILSLSEEERRHQMETILSYDKEIEVFQGVIDEIMAYL 
ANITVPVNIMYGEEDDPLYAQSAQYIYDNVNSQDKELLKFEKSGHLMTYGDHAYRVV^QSI 
IQFFJK^ 

Sequence 2167 
55 Contig_0718_pos_6737_5586, 

putative peptide of unknown function 

atggaatttacggtaggtaaaatgggtcgtacatatacaacgcaaatatataagaaatta 
acgggaaagaaatggcttaatattatcggatggaatggtaatttagccgtatttatacta 
tttggtttttatagtgttattggtggttggattattatatatataggttatgtcatagca 



538 



wo 01/34809 



PCT/USOO/30782 



caaatcatggtttttaaatcaagtacgctgacaaatattcaatttgaaacaatcattagt 
aatccatggttgactgttttaggtcaaggcatatttattttgataacaatggtaattgtt 
atgttaggtgttgaaaaaggtttagaaaaagcttctaaaataatgatgcctctattattt 
atct c tt>.aattatcgttgtagcacaatctttaactttagaaggtgctttagaag<;i tgta 
5 cgttatatactgcaacctcgagttgaagatatgtctattcaaggtgtactatttgcgtta 
ggacaatcgttttttacgctgtccctaggtacaaccggaatgattacttatgcaagctat 
gcacctaaaaatatgacgataaagtcttcagcactttcaattgtcgtaatgaatatttta 
atttctgtcttggctggattagctatatttcctgcgcttaaaacatttggttaccaaccc 
caagaaggccctggcttattatttaaggttttaccactagtatttagcgaaatgactttt 

10 ggtacattcttt tact ttatattttt acta ttattcttatttgcggcattaacgtcttct 
atatcattattagagttaaatgtatctaattttactaaaaatgataatagtaaaagacaa 
aaagtggcaatcataggtagtatacttgtatttatcattagtatcccagcaacattatct 
tttagtagtctaagtcatttgcgttttggcgctggtacgatatttgataatatggatttt 
attgtatctaatattcttatgccattaggggcactaggaacaacattagtggttggccaa 

15 ttactagataaaaaattattaaaagaaagctttgggaaagacaaattcaacctattttta 
ccgtggtattatttaattaagttcatcatgcctattgttattattttagtatttatagtt 
caattattttaa 

Sequence 2158 

20 MEFTVGKMGRTYTTQiyKKLTGKKWLNIIGWNGNLAVFILFGFYSVIGGWIIIYIGYVIA 
QIMVFKSSTLTNIQFETIISNPWLTVLGQGIFILITMVIVMLGVEKGLEKASKIMMPLLF 
IFLIIVVAQSLTLEGALEGVRYILQPRVEDMSIQGVLFALGQSFFTLSLGTTGMITYASY 
APKNMTIKSSALSIWMNILISVLAGLAIFPALKTFGYQPQEGPGLLFKVLPLVFSEMTF 

GTFFyFTFLLLFLFAALTSSISLLELNVSNFTKNDNSKRQF:VAIIGSILVFIISI?ATLS 
25 FSSLSHLRFGAGTIFDNMDFIVSNILMPLGALGTTLWGQLLDKKLLKESFGKDKFNLFL 
PWYYLIKFIMPIVTILVFIVQLF* 

Sequence 2169 

Co n t i g_0 7 1 8_pos_4 5 6 5_3 6 5 7 , 

30 is similar to (with p-value 9.0e-76) 

>gp:gp! U93874 |BSU93874_1 Bacillus subtilis cysteine synthase 
(yrhA) , cystathionine gamma-lyase (yrhB) , YrhC (yrhC) , YrhD 
(yrhD) , formate dehydrogenase chain A (yrhE), YrhF (yrhF) , 
formate dehydrogenase (yrhG) , YrhH (yrhH) , regulatory protei 

35 n (yrhl), cytochrome P450 102 (yrhJ), YrhK (yrhK) , hypotheti 
cal protein YrhL (yrhL) , putative anti-SigV factor (yrhM) , R 
NA polymerase sigma factor SigV (sigV) and YrhO (yrhO) genes 
, complete cds, and YrhP (yrhP) gene, partial cds. NID: gl93 
4604. >gp:gp|Z99117|BSUB0014_206 Bacillus subtilis complete 

40 genome (section 14 of 21): from 2599451 to 2812870. NID: g26 
34966. 

atgattgcatacgatttgataggacaaactccattagttttattagaaagctttagtgac 
gaga?.tgttaaaatatacgccaaacttgagcaatttaatcctggtggtagcatcaaagac 
cgtctagggaagtacttaattgaaaaagcaatagatgaaggacgacttaaagaaggggat 

45 acaatagtcgaagcgactgctggtaatacaggcattggacttgctattgcttctaatcgg 
cacaaagtaaaatgtatcatctttgctccagaaggatttgcagaagaaaaaatttcaatt 
atgaaagcattgggtgcagatgttagacgtacccccaaagctqagggaatgactggcgca 
cagcaagaggcgttggcatacgcaacacgatatggatatttatatatgaatcaattcgaa 
actaaagataatcctggggca tatacacaaacacttgccaaacaactcacagatgaactt 

50 tcacatattgattattttgtggcaggtgttgggtccggtggtacgtttacaggagttgca 
caacacttaaaaacgtatgatgtaaaaaattatattgtagaaccagaaggctctgtctta 
aatggtggtgtcagtcatcctcatgcaactgaagggattggttctgaaaagtggccatca 
tttttagaaaaagaattagtagatggtatttttactgttgccgataaagatgcttttaat 
aatgttaaacttgtcgcgaataaagaaggattgttagttggtagttcttcgggagcggca 

55 ttacaaggagcgttggaattaaaaaaaagcattcaaaatggtgtgattgttaccatcttt 
ccagatggaagcgatcgatacatgtccaaacaaatattcaactataaggagagttttaat 
aatgaataa 

Sequence 2170 
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MIAYDLIGQTPLVLLESFSDENVKIYAKLEQFNPGGSIKDRLGKYLIEKAIDEGRLKEGD 
TIVEATAGNTGIGLAIASNRHKVKCIIFAPEGFAEEKISIMKALGADVRRTPKAEGMTGA 
QQEALAYATRYGYLYMNQFETKDNPGAYTQTLAKQLTDELSHI DY FVAGVGSGGTFTGVA 
QHLKTYDVKNYIVEPEGSVLNGGVSHPHATEGIGSEKWPSFLEKELVDGIFTVADKDAFN 
5 NVKLVANKEGLLVGSSSGAALQGALELKKSIQNGVIVTIFPDGSDRYMSKQIFNYKESFN 
NE* 

Sequence 2171 

Contig_0718_pos_364 6_2519, 

10 is similar to (with p-value O.Oe+00) 

>gp:gplU93874 |BSU93874_2 Bacillus subtilis cysteine synthase 
(yrhA) , cystathionine gamma-lyase (yrhB) , YrhC (yrhC) , YrhD 
(yrhD) , formate dehydrogenase chain A (yrhE) , YrhF (yrhF) , 
formate dehydrogenase (yrhG) , YrhH (yrhH) , regulatory protei 

15 n .yrhi;, cytochrome P450 102 (yrhJ), YrhK (yrhK), hypo^neti 
cal protein YrhL (yrhL) , putative anti-SigV factor (yrhM), R 
NA polymerase sigma factor SigV (sigV) and YrhO (yrhO) genes 
, complete cds, and YrhP (yrhP) gene, partial cds. NID: gl93 
4604. >gp:gp| Z99117 |BSUB0014_205 Bacillus subtilis complete 

20 genome (section 14 of 21): from 2599451 to 2812870. NID: g26 
34966. 

atgatacatgggggacatacgacagacaactatactggagcagtgacaacacctatttat 
caaacaagtacttatttacaagatgatattggtgatttaagacaagggtacgaatattca 
cgtactgcaaatcctacacgtgcgtctcttgaaagtgttattgctaatttagaacatggt 

25 aagcatggttttgcttttggttcaggaatggcagcaattagtgcagttatcatgttatta 
gataaaggagatcacttagttcttaattctgatgtttatggtggcacatatcgtgcatta 
actaaagtatttactcgctttggtatagacgtagattttgttgatacaactaaaattgaa 
aacattgaacaatatattaaacctgaaactaaaatgttatatgtagaaacaccttcaaat 
ccattattgcgtgtgactgatat taaagcatcagcaaaaattgcaaaaaaatatgatttg 

30 atatctgtagtcgataatacatttatgacaccttactaccaaaaccctttagactttggt 
attgatatcgtattgcattcggctacaaaatatattggaggccatagtgatgttgtagct 
ggtcttgttgctactgctgatgatgatttagcagaacgtctaggctttatttcaaattct 
acaggtggtgtacttggacctcaagatagctatttattaatcagaggtattaaaacgcta 
ggtctaagaatggagcaaataaaccgaaacgttgaaggtattgtgcaaatgttacajtaag 

35 caccctaaagttcaacaagtattccatcctagtattaaggaacatatgaactatactatc 
catcaaaatcaagcaactgggcatacaggggtagtatcttttgaagttaaagatacagaa 
gcggctaaacaagtgattcacgcaacaaactactttacactggcagagagtttaggggca 
gttgaaagtctaatttctgtaccggcacttatgacgcatgcgtccatcccatcagatgta 
agagccaaggaaggtattacggatggtctcattcgtttatctattggtattgaagacaca 

40 gaagacttagttaatgatttagaacaagccttaaatactttgagataa 

Sequence 2172 

MIHGGHTTDNYTGAVTTPIYQTSTYLQDDIGDLRQGYEYSRTANPTRASLESVIANLEHG 
KHGFAFGSGMAAISAVIMLLDKGDHLVLNSDVYGGTYRALTKVFTRFGIDVDFVDTTKIE 
45 NIEQYIKPETKMLYVETPSNPLLRVTDIKASAKIAKKYDLISWDNTFMTPYYQNPLDFG 
IDIVLHSATKYIGGHSDWAGLVATADDDLAERLGFISNSTGGVLGPQDSYLLIRGIKTL 
GLRMEQINRNVEGIVQMLQKHPKVQQVFHPSIKEHMNYTIHQNQATGHTGVVSFEVKDTE 
AAKQVIHATNYFTLAESLGAVESLISVPALMTHASIPSDVRAKEGITDGLIRLSIGIEDT 
EDLVNDLEQALNTLR* 

50 

Sequence 2173 

Contig_0719_pos_3554_2802, 

is similar to (with p-value l.Oe-71) 

>sp : sp I P29928 t SUMT_BACME UROPORPHYRIN- II I C-METHYLTRANSFERAS 
55 E (EC 2Jl. 1.107) (UROGEN III METHYLASE) (SUMT) (UROPORPl-IYRIN 
OGEN III METHYLASE) (UROM) . >pir : pir ( A42479 | A42479 S-adenosy 
1-L-methionine uroporphyrinogen III methyltransf erase - Baci 
llus megaterium >gp: gp tM62881 I BACC0BA_1 Bacillus raegaterium 
S~adenosy-L-methionine: uroporphyrinogen III methyltransf eras 
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e (COBA) gene, complete cds. NID: gl42694. 

gtggttcatatggggaaagtatatttagttggagctggacctggtgatccagaattaata 
acgttaaaaggtttaaaagccattaaagaagccgatgtcatcctttatgaccgacttgta 
aataaagaaatacttaattatgcttctccttctactaagttcttctattgcggtaaggat 
5 cctcacaggcactccttaccgcaggaagaaacaaataaaatgatggtaaccttagccaaa 
aaagggcacatagttacacgtttaaagggtggcgatccatttgtttttggacgtggcgga 
gaagp.aqcagaggaattagcatgtcataatatccactttgaaattatacctggaattcca 
gtaacacatcgtgattatagttcttctgtagcatttgtaactgcagtgaataaacctggt 
atggataaaggcaaatactggcaacatttggccaatggtcctgaaactttatgtatttat 
10 atgggggttaagagactcagtgaaatttgtgagttgttaatacaatatggtcgttcgtca 
gaaacaccagtagctctcgtgcatatgggaacgtcaaaacagcaaatgacagtgactggg 
acactcgatacaattcaagaacgagcacatcatattcagaatccagcaatgattattgta 
ggcgaagtggttaagatgagagaaaaaattaattggtttgtagaacaggcaactgttcaa 
aatgaaacgttaacggaaatgtcatcaacttag 

15 

Sequence 2174 

WHMGKVYLVGAGPGDPELITLKGLKAIKEADVILYDRLVNKEILNYASPSTKFFYCGKD 
PHRHSLPQEETNKMMVTLAKKGHIVTRLKGGDPFVFGRGGEEAEELACHNIHFEIIPGIP 
VTHRDYSSSVAFVTAVNKPGMDKGKYWQHLANGPETLCIYMGVKRLSEICELLIQYGRSS 
20 ETPVALVHMGTSKQQMTVTGTLDTIQERAHHIQNPAMIIVGEVVKMREKINWFVEQATVQ 
NETLTEMSST* 

Sequence 2175 

Contig_07 1 9_pos_272 1_2 116, 

25 is similar to (with p-value 4.0e-19) 

>gp:gp| AJ00097 4 |BSPYREYL0_8 Bacillus subtilis pyrE to y.loA g 
ene region. NID: g2462954. >gp: gp| Z99112 1 BSUB0009_34 Bar.illu 
s subtilis complete genome (section 9 of 21) : from 15984 21 t 
o 1807200. NID: g2633902. 

30 atgcccttaatgattgatttaagtaacaagaaagtcgtcattgtaggtggaggtaaagtg 
gcaacacgtcgtgctaaaactttattagcttatacaaaacatattcatgttgtaagtcca 
acaattaccgatacattacaaaaatatctagaaacgaagcaaatcacttatgaaaaqaaa 
cacttcgaaccacaagatgttgagaatgctgatgtggtcatcgcggctactaa tcaatct 
gatgttaacaacgatgtgggggcagctttgtctaagaacgtattatttaatcatgcagga 

35 caagcagacctaggtaatgtaacgttccctaatttcttaaaaagagataaattaacaata 
agtgtatcaactgatggtgcaagtcctaaattaggtcaacgaattattaaagatttaaaa 
gatacatacaatgaagactattcaatgtatattcagtttttatatgaaagtagacaatat 
attaaatcacttaaaattgagccatctgataaacaagcgttactcgagcaaattttgtca 
gacaaatatttagatgagaagaagcaacaagatttcatccgatggctaaaatcacaagtc 

40 aaatga 

Sequence 2176 

MPLMIDLSNKKVVIVGGGKVATRRAKTLLAYTKHIHVVSPTITDTLQKYLETKQITYEKK 
HFEPQDVENADWIAATNQSDVNNDVGAALSKNVLFNHAGQADLGNVTFPNFLKRDKLTI 
45 SVSTDGASPKLGQRIIKDLKDTYNEDYSMYIQFLYESRQYIKSLKIEPSDKQALLEQILS 
DKYLDEKKQQDFIRWLKSQVK* 

Sequence 2177 

Contig_0719_pos_2029_1196, 

50 putative peptide of unknown function 

atgggatttggcgcttcatcgtcatcaatattattaacttacggtatagcaccggcagta 
gtgtcagcaaccgttcatttttctgaaattgcaacaacagctgcatctgggacatcacat 
tggagatttgataatgttcataaaccaacaatgttgaagttagctatacctgggtcaata 
agcgcctttatcggtgcaggtgttttgacatttattcatggtgattatattaaaccattc 

55 attgctttattcttgttaagtatgggattttatattttgtatcaatttctatttaaacgt 
gcacatgaaca teat cat catgtgggaaatttgagtagttttaaagtaattccacaaggt 
tttgtggcaggatttttagacgcaatcggtggtggtggttggggaccggttaatacgccg 
ctcctgctttcaagtaaaaaaattcaaccacgatatgcgattggaacagtctcagcaagt 
gaattttttgttacgtcatctgccgctttaagtttcattatctttttaggagtcactcaa 
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attaattggtttgctgtaattgctttaagtctcggtggaatggtagcagcacctatttca 
gcgtatttagttaaagtgttacccattaacattcttgcaatttgtgtcggtggtttaatt 
atttttacaaatagtaatgcattattaagctattttgtaaaagataacactatttcaaat 
acagttcgattcattattattcttgcaattattattttgcttgtttttcaagtcgttcga 
5 aacaagaaattgtctttttcttataagaaaagccgagtaaacaaatataattaa 

Sequence 2178 

MGFGASSSSILLTYGIAPAVVSATVHFSEIATTAASGTSHWRFDNVHKPTMLKLAIPGSI 
SAFIGAGVLTFIHGDYIKPFIALFLLSMGFYILYQFLFKRAHEHHHHVGNLSSFKVIPQG 
10 FVAGFLDAIGGGGWGPVNTPLLLSSKKIQPRYAIGTVSASEFFVTSSAALSFI I FLGVTQ 
INWFAVIALSLGGMVAAPISAYLVKVLPIKILAICVGGLIIFTNSNALLSYFVKDNTISN 
TVRFIIILAIIILLVFQVVRNKKLSFSYKKSRVNKYN* 

Sequence 2179 
1 5 Con t ig_0 7 1 9_pos_0_l 172, 

is similar to (with p-value O.Oe+00) 

>gp:gp| AJ000974 I BSPYREYL0_4 Bacillus subtilis pyrE to yloA g 
enc caqxon. NID: g2462954. >gp : gp | Z991 12 | BSUB0009_30 Bacillu 
s subtilis complete genome (section 9 of 21) : from 1598421 t 

20 o 1807200. NID: g2633902. 

atgtctaacaatgaaacaataaccaattatacaattaaacctcatggaggagaactcatc 
aatcgtgttgttgaaggaaacgaacgtgaacgtttgattgaggaagcattaaattttaaa 
ccgattactttaaatccttggggaatatcggatctagagctcataggtattggcggattt 
agtccccttacaggatttatgaacaaggaagactacactaaggttatagaggaaacacat 

25 ttaagcaatggcttagtttggagtattcctatcactttacctgtaacagaatccgaagca 
gataaacttgaaataggtgatgatattgctttatatggtgaagatggtcagttatatgga 
acgcttaaattagaagaaaagtacacatatgataaagaaaaagaagcgcgtttggtgtac 
ggaactactgaagaagctcatcctggagttaaaaaggtttatgaaaaaggtaatatatat 
ttaggtggtcctattaaactattaaatcgtccaaaacatgacgcgttttcaaattatcat 

30 ctggatccttcagagacgagacaattatttcatgatttaggttggaaaactgtcgtaggt 
tttcaaacgagaaatccagtgcatcgagcacatgaatatattcaaaaatcagcactagaa 
attgttgatggtctacttttaaatccactagttggtgaaacaaagtcagacgatattcca 
gcggatgtacgtatggaaagttatgaagtgatattaaaaaactattatcctgaagataga 
gcacgtctagtcatttatcctgctgcaatgcgctatgccggaccacgtgaagcgatactt 

35 catgcaactgtccgtaaaaattatggttgtacacattttattgtgggaagagatcacgct 
ggggtaggcgattattatggtacttatgaagcacaagagctgattactcaatttgaagat 
gaottagqtattcaaattttaaaatttgaacatgccttttattgcgaagcttgtggtaat 
atggcaactgctaaaacatgtccgcatgacgcttctcaacatttacatttaagtgvtact 
aaagtaagagaaaaactgcgtaatggcgaatcattgccaactaaattttcaagaccagaa 

40 gttgccgaagttctaattaaaggtttgcgagT 

Sequence 2180 

MSNNETITNYTIKPHGGELINRWEGNERERLIEEALNFKPITLNPWGISDLELIGIGGF 
SPLTGFMNKEDYTKVIEETHLSNGLVWSIPITLPVTESEADKLEIGDDIALYGEDGQLYG 
45 TLKLEEKYTYDKEKEARLVYGTTEEAHPGVKKVYEKGNIYLGGPIKLLNRPKHDAFSNYH 
LDPSETRQLFHDLGWKTWGFQTRNPVHRAHEYIQKSALEIVDGLLLNPLVGETKSDDIP 
ADVRMESYEVILKNYYPEDRARLVIYPAAMRYAGPREAILHATVRKNYGCTHFIVGRDHA 
GVGDYYGTYEAQELITQFEDELGIQILKFEHAFYCEACGNMATAKTCPHDASQHLHLSGT 
KVREKLRNGESLPTKFSRPEVAEVLIKGLRV 

50 

Sequence 2181 

Contig_0720_pos_409_906, 

is similar to (with p-value 3.0e-25) 

>sp:sp|P35164 1RESE_BACSU SENSOR PROTEIN RESE (EC 2.7.3.-). > 
55 pir:pir IS45560IS45560 hypothetical protein X18 - Bacillus su 
btilis >gp:gp|L09228|BACDIA_27 Bacillus subtilis spoVA to se 
rA Tegir^n. NID: g410114. >gp: gp I Z99116 1 BSUB0013_23 BaciJ lus 
subtilis complete genome (section 13 of 21): from 23952ol to 
2613730. NID: g2634723. 
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atggatgctgaaggattatcagttgagaaggaattacaacctattcaacaccttcttgat 
aaaatggagtctaaatatcgcatgcaaagtgaagaattaggtttaacaatgacgtttgat 
tctaataatgacgaacaattatggaactatgatatggatagaatggaccaagtgttaact 
aatttaattgataacgcaacaagatatacacaagctggtgattctataaagatttctatt 
5 gatgaagattcagatttcaatattttaacaataactgatacaggcactggtatagcaccg 
gaacatctgaaacaagtatttgaccgtttttataaagtggacgctgctcgaaaaagaggt 
aagcaaggcaccggattaggacttttcatttgtaaaatgattattgaagaacacggggga 
cgtattgatgttgagagcgaattaggcaaaggtacttcatttattattagactacctaaa 
tc.'.ap.acaaattagttag 

10 

Sequence 2182 

MDAEGLSVEKELQPIQHLLDKMESKYRMQSEELGLTMTFDSNNDEQLWNYDMDRMDQVLT 
NLIDNATRYTQAGDSIKISIDEDSDFNILTITDTGTGIAPEHLKQVFDRFYKVDAARKRG 
KQGTGLGLFICKMIIEEHGGRIDVESELGKGTSFIIRLPKSKQIS* 

15 

Sequence 2183 

Contig_0720_pos_1420_2037, 

is similar to (with p-value 7.0e-39) 

>sp:sp| P50726I YPAA_BACSU HYPOTHETICAL 20.5 KD PROTEIN IN SER 

20 A-FER INTERGENIC REGION. >gp : gp 1 Z99116 | BSUQ0013_17 Bacillus 
subtilis complete genome (section 13 of 21): from 2395261 to 
2613730. MID: g2634723. >gp: gp I L4 764 8 | BACSERA_2 Bacillus su 
btilis phosphoglycerate dehydrogenase (serA) , ypaA, ferredox 
in (fer), ypbS, recS, ypbD, ypbE, ypbF, ypbG, ypbH, glutamat 

25 e dehydrogenase (ypcA) , ypdA, ypdB, ypdC, spore cortex lytic 
enzyme (sleB), ypeB, ypfA, ypfB, cytidine monophosphate kin 
ase (cmk), ypfD, ypgA, yphA, yphB, yphC, NAD+ dependent glyc 
erol-3-phosphate dehydrogenase (glyc), yphE and yphF genes, 
conplete cds . NID: gll46195. >gp : gp I L4 7 64 8 | BACSERA_2 Bacillu 

30 s subtilis phosphoglycerate dehydrogenase (serA), ypaA, ferr 
edoxin (fer), ypbB, recS, ypbD, ypbE, ypbF, ypbG, ypbH, glut 
amate dehydrogenase (ypcA), ypdA, ypdB, ypdC, spore cortex 1 
ytic enzyme (sleB), ypeB, ypfA, ypfB, cytidine monophosphate 
kinase (cmk) , ypfD, ypgA, yphA, yphB, yphC, NAD+ dependent 

35 glycerol-3~phosphate dehydrogenase (glyc), yphE and yphF gen 
es, complete cds. NID: gll46195. 

atgggagaagatggaggttttttgttgttcaaatttcctccttttctaagtaagatgaat 
ggaaggagaaaattatatatgcaacaaaacaaacgtttgattacaattagtatgttaagt 
gcggtagcgtttgtgttaactttcatcaagtttccattgccatttataccaccgtatcta 

40 actctcgattttagtgatgtaccgacgttattagcaacattcctcttaagtcctattgct 
gggattatcgttgcactcatcaaaaatattttaaattttctattcaatataggggatcct 
gttggaccagtagctaactttttagcaggcgtcagctttttgctatcatcatactatgtt 
tatagaaaaagaaaaaataatcgttctttaatttatggattaattacaggtacaatcgtt 
atgactattgttttgagcatcttaaattattttgtgttacttccattatatggaatgata 

45 tttaatttaggtgatgtgcttaataacgtaaaaattgttattgtgtctggagtcatacct 
tttaatttaattaaaggcataatcatttccattatatttgtgctgttatttagaagatta 
. agacatatcatcaaataa 

Sequencer. 2184 

50 MGEDGGFLLFKFPPFLSKMNGRRKLYMQQNKRLITISMLSAVAFVLTFIKFPLPFIPPYL 
TLDFSDVPTLLATFLLSPIAGIIVALIKNILNFLFNIGDPVGPVANFLAGVSFLLSSYYV 
YRKRKNNRSLIYGLITGTIVMTIVLSILNYFVLLPLYGMIFNLGDVLNNVKIVIVSGVIP 
FNLI KG I I I S II FVLLFRRLRH I I K* 

55 Sequence 2185 

Contig_0720_pos_2823_3761, 

putative peptide of unknown function 

atgagtcatgcatttaactacaaaaccaataaaagtatctataatattttaacaggcaag 
aagtcacaccaaacgtttttcgatgcgtcaagccaacaacttttgtcattatatcatagt 
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ttctcctaacttaaagtattcaacttttgagcaatttatactccaaaaggatgatfTtaaa 
aaatcaattcaagtaaaaatacatccacaatacacttatgatagtctcactcaaaccttt 
agttgcatacaattacttatccaaacgttatctcatacacgcaaggagtcaaatacattt 
attccaatcgttcaaaatacctatatccaacaaagggttaaacaactttatcatcaggtc 
5 attgaatcaaatcaagtatcaaatactatagacgaaatatatttattatttgagaattta 
aataataaatataaccatacatttcttcattattatttacaaggatatgaggaatccatg 
tatactagacaacaaataagtttaattgagagtataccacaatcagaattatttgaacga 
gaaatgaatgaactgattgacatattgaatcaattaaaagattcaacgaaatatccaata 
ct tt ctcaagct at cat tcttt caeca tt acta acaaatacat act taagctatcaaaag 
10 ttaaaatctggtctcaatttaaaagaaattgctcaattacaaaatgtaaaacttaacaca 
attgaagatcatattctagaaatgtatattaaaggttatttgatagactatacattattt 
ataaataaaaaagatattctcgaatttataaactactatcaaaaacatcgcggtgaacga 
ttaaaattttataaagaacattttactgattggacttactttcaaattaagttagttata 
gtaggaatagaaagaggtgatttaattgctgaaagataa 

15 

Sequence 2186 

MSHAFNYKTNKSIYNILTGKKSHQTFFDASSQQLLSLYHSLPNLKYSTFEQFILQKDDFK 
KSIQVKIHPQYTYDSLTQTFSCIQLLIQTLSHTRKESNTFIPIVQNTYIQQRVKQLYHQV 
lESNQVSNTIDEIYLLFENLNNKYNHTFLHYYLQGYEESMYTRQQISLIESIPQSELFER 
20 EMNELIDILNQLKDSTKYPILSQAIILSPLLTNTYLSYQKLKSGLNi.KEIAQLQNVKLNT 
lEDHILEMYIKGYLIDYTLFINKKDILEFINYYQKHRGERLKFYKEHFTDWTYFQTKLVI 
VGIERGDLIAER* 

Sequence 2187 
25 Cont ig_0 7 2 0_pos_5 7 5 7_5 161, 

putative peptide of unknown function 

atgtcaatggataaagctacaaaatatgcacttcaattaagggtgattgctcaagaaagc 
tatcctagtgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaa 
ttaaaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaa 

30 caactcgat tat tttgaaaatat teat tcgatacctggtattggtaagctaagcacagct 
atgattattggggagattggtgatattaagcgatttaaatcaaataaaeaactcaatgct 
tttgttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatc 
aacaagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataata 
agagggcagcat cattatgacaatcatgtcgtcgat tat tact acaaactaagaaagcag 

35 cctaatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaaca 
attcattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 2188 

MSMDKATKYALQLRVIAQESYPSVDRHSFLVEKLRLLIQQLKQSIHHLKQLDDAMIQLAQ 
40 QLnYFENJHSIPGIGKLSTAMIIGEIGDIKRFKSNKQLNAFVGIDIKRYQSGHTHCHDTI 
NKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQPNEKPHKTAIIACINRLLKT 
IHYLVMNHKLYDYQMSPH* 

Sequence 2189 
45 Cont ig_0 7 2 l_pos_8 5 8_1 7 7 2 , 

is similar to (with p-value O.Oe+00) 

>sp: spl P50307 IMETK^STAAU S-ADENOSYLMETHIONINE SYNTHETASE (EC 
2.5.1.6) (METHIONINE ADENOSYLTRANSFERASE) (ADOMET SYNTHETAS 
E) . >gp:gp|U36379!SAU36379_l Staphylococcus aureus S-adenosy 

50 Imethionine synthetase gene, complete cds. NID: gl020316. 

atggttaaaaataaacagtctcctgatattgcacagggtgtagacaaagctcttgagtat 
cgaaatgatatttctgaagaagaaattgaagcaaeaggtgcaggtgaccaaggattaatg 
tttggatatgcaactgatgaaactgatacgtatatgcctctacctatattcctgtcacat 
caacttgctaaacgattggctgatgtacgaaaagatgaaattttagattatcttcgtcca 

55 gatggaaaagtacaggtgactgttgaatatggtaaagatgacaaacctagacgtattgat 
accattgtagtttctacacaacatgctgaagatgtagagttagcacaaattgaaaaggac 
attaaaacgcatgttatttacccaactgtagataaagctttattagatgatgaaactaaa 
ttttacattaacccaactggacgtttcgttattggaggacctcaaggagatgctggttta 
actggacgtaaaattatagttgatacgtatggtggttatgcccgtcatggtggaggVtgt 
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tttagtggtaaagatcctactaaagtagatcgttcagcagcttatgcagcaagatatgta 
gctaaaaatattgttgcagctggtttagctaaacaatgtgaagtacaacttgcatatgca 
attggtgtagcagaacccgtttccatttcaattaatacgtttgatactggaaaggtttca 
gaagcacgtttagttgaagctgtaagaaagcattttgatttaagaccagcaggtatcatt 
3 aaaatgttagacttaaaacaaccgatatatagacaaacagcagcgtatggtcattttgga 
cgtacagacgtattgttaccatgggaaaaattagataaagtcaatgttttaaaagatgct 
gttgaaattcaatga 

Sequence 2190 

10 MVKNKQSPDIAQGVDKALEYRNDISEEEIEATGAGDQGLMFGYATDETDTYMPLPIFLSH 
QLAKRLADVRKDEILDYLRPDGKVQVTVEYGKDDKPRRIDTIWSTQHAEDVELAQIEKD 
IKTHyiVPTVDKALLDDETKFYINPTGRFVIGGPQGDAGLTGRKIIVDTYGGYARHGGGC 
FSGKDPTFCVDRSAAYAARYVAKNIVAAGLAKQCEVQLAYAIGVAEPVSISINTFDTGKVS 
s EARLVEAVRKHFDLRPAGIIKMLDLKQPIYRQTAAYGHFGRTDVLLPWEKLDKVNVLKDA 

15 VEIQ* 

Sequence 2191 

Con t ig_0 7 2 l_pos_l 92 3_2 783, 

putative peptide of unknown function 

20 gtggctgttttattctttatactcttcttagtagcgaaccatagtaagaagaaagtgaag 
aatcaaacagaagcacattataaagaaaaagaacaacatctaaaagaatctcatgaagaa 
gctttagaaaaagagagagttgagaataaaaaagttgttacaaaacaaaaagaagatttt 
gacgtgacagttagtaacaaaaatcgtgaaattgatgcgttgaaactattctcaaaaaat 
catagtgaatatgttacagatatgcgattaattggtattcgtgagagactagtaaatgaa 

25 aagagaatacgccctgaagatatgcatattatggcaaatattttcttgcctagtaatgag 
ttaactaatattgaacgtgtgagtcatcttgtacttacacgaactggcttatatataatt 
gattctcaattattaaaaggccatgtttataatggtattagtggtgcgcaatttaaagaa 
ttacctacgatgtcacaagtttttgacacgctcgacttagattcatcacagccacagaca 
ttggtcttagatcaaaatgaagatcaacattcattatcttttgttaattattcagataag 

30 attaaacatattgaaaaattagcaggagatttacaaaatgaattgaataccaaatatacg 
cctacatcaatactgtattttaatcctaaaaaggataatgatgttacaatttctcat:tat 
acgcc.g'.catcaaatgttaaagttttagttggtcctgaacaattagatgaattctwCaac 
aagtttgttttccatggacgcatacagtacaatgtggatgatttacaagatatcatggat 
aaaatcgagtcattcaattaa 

35 

Sequence 2192 

VAVLFFILFLVANHSKKKVKNQTEAHYKEKEQHLKESHEEALEKERVENKKWTKQKEDF 
DVTVSNKNREIDALKLFSKNHSEYVTDMRLIGIRERLVNEKRIRPEDMHIMANIFLPSNE 
LTNIERVSHLVLTRTGLYIIDSQLLKGHVYNGISGAQFKELPTMSQVFDTLDLDSSQPQT 
40 LVLDQNEDQHSLSFVNYSDKIKHIEKLAGDLQNELNTKYTPTSILYFNPKKDNDVTISHY 
TQSSNVKVLVGPEQLDEFFNKFVFHGRIQYNVDDLQDIMDKIESFN* 

Sequence 2193 

Contig_0721_pos_3101_3799, 
45 is similar to (with p-value 2.0e-56) 

>gp:gp| AL0344 43 ! SC4B5_1 Streptomyces coelicolor cosmid 4B5. 
NID: g4007668. 

atgatttatggtaacgaagaaacggtaggtcaaggtattaaagaaggattagaatcgact 
ggtttaagtcgtgaagatttatttataacttctaaattatggctaactgattttggacgt 

50 caaaatgtggaagatgcctatcgacaatctgttgcaaaattaggactggattatttagat 
tt^.tatctgatgcattggccaggtacaaatgaagcggtaatgattgatacttggar/iggt 
atggaagacttgtataaacaaaatcaggtgaaaaatattggtgtgagcaatttta.-.tcca 
gaacattttgaagcattgcttgcccaagtttctattaaaccggtcattaaccaagtagaa 
tttcatccttatttaacacaaaatgaattacgaaagtatttagaagctcaaaatattatc 

55 atggaatcatggtctccattaatgaattcacaaattctccatgatgaagtcataaatgaa 
gtagctaatgaagtaggaaaaactccagcccaagttgtaataagatggaatattcagcac 
gatgttgttgttatacctaaatccgtaacaccacatcgcatagaagaaaatttagacgta 
tggaattttgaattaagcgacaaccaaatggaaagaatcgatcaattaaatcaagataaa 
agaattggacctaaccctcttgaatttaacggtaagtaa 
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Sequence 2194 

MiyGNEETVGQGIKEGLESTGLSREDLFITSKLWLTDFGRQNVEDAYRQSVAKLGI:OYLD 
LYLMKWPGTNEAVMIDTWRGMEDLYKQNQVKNIGVSNFTPEHFEALLAQVSIKPVVNQVE 
5 FHPYLTQNELRKYLEAQNIIMESWSPLMNSQILHDEVINEVANEVGKTPAQWIRWNIQH 
DVVVIPKSVTPHRIEENLDVWNFELSDNQMERIDQLNQDKRIGPNPLEFNGK* 

Sequence 2195 

Contig_072 l_pos_54 89_4 4 7 6, 

10 putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 
gcttgtttaggaccgacgcttaaacaaacagacagcttacctatacatgagttaatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 

15 aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattgtttagt 
agtcgatattcaatcattgcactcaacatcgcagaaatctttactcatccagacatggtt 
cttgatatcgacaaggatgtactgattacacatatattcaattctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagagtgattactcaagaaagctat 
cctaatgtcgatagacattcctttctagtcgaaaaattacgcttacttattcaacaatta 

20 aaacaatctattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 
ctcgattattttgaaaatattcattcgatacctggtattggtaaactaagcacagctatg 
attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataacaaga 

25 gggcagcatcattatgacaatcatgtcgtcgattattactacaaactaagaaagcagcct 
aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 
cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 2196 

30 MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTDSLPIHELIF 
FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSIIALNIAEIFTHPDMV 
LDIDKDVLITHIFNSTDKGMSMDKATKYALQLRVITQESYPNVDRHSFLVEKLRLLIQQL 
KQSIHHLKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMIIGEIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 

35 NEKPHKTAIIACINRLLKTIHYLVMNHKLYDYQMSPH* 

Sequence 2197 

Contig_072 4_pos_34 4 3_37 4 8 , 

putative peptide of unknown function 

40 atgctcctttctgctatactccttataaaaggaggtgaaaatatgaaaagttttattatt 
gcgtatgatttaaataaccaaaaggattatccaaaattaatagagcgtattgaggattat 
cctaatgttgctaaaatcaataaatcagtttggtttattaattcaactaatgatgctaaa 
actattc.gaaacgaattaaaaatgtttattgatagcgatgatagtttgttcgttgcicaag 
ctgactggtgaagccgcatggtctaatgtaatttgcagttcacaacatttaaaagattat 

45 ctttag 

Sequence 2198 

MLLSAILLIKGGENMKSFIIAYDLNNQKDYPKLIERIEDYPNVAKINKSVWFINSTNDAK 
TIRNELKMFIDSDDSLFVGKLTGEAAWSNVICSSQHLKDYL* 

50 

Sequence 2199 

Con t ig_07 2 4 _pos_4 32 1_4 635, 

putative peptide of unknown function 

atgtgcttttcaaaaagaatgaaacaatcaagagaaaaacaaggtatgactttagctgaa 
55 ctaggaagaaaaatcggtaaaactgaagctactgtacaacgttatgaaagcgggaatatt 
aaaaatcttaaaaatgatactattgaaagtatagctactgcattaaatgttaaccctgct 
ttcttgatgggttggatagaagaagttgaggaacaaccacaacatcgtgcagcgcatctt 
gatggtgatttaactgacgaagaatggcaagaaattcttgattacgctgaatacataaga 
agtaaaagaaaataa 
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Sequence 2200 

MCFSKRMKQSREKQGMTLAELGRKIGKTEATVQRYESGNIKNLKNDTIESIATALNVNPA 
FLMGWIEEVEEQPQHRAAHLDGDLTDEEWQEILDYAEYIRSKRK* 

5 

Sequence 2201 

Contig_0724_pos_4 639_0, 

putative peptide of unknown function 

gtgttttatgtggggaaatatgaagatatgttaattgaacatgactatattgaagtcatt 
10 gaatgtgataacttacctaaaaggttatctggtttgtggcttggagatatgattttaatt 
aatcgtaacttgcctattacttccaaacttgaaacacttgcagaggaactcgctcataac 
gaacttacatatggaaatatagttgatcaaagtagttttaatcatagaaaatttgaaggt 
tatgcacgtaggttagcctatgaaaagttaatccctcttaaagatattgtaaaagcattt 
ttcicaaggcattcatgacttgtatgaacttgctaatttttttga 

15 

Sequence 2202 

VFYVGKYEDMLIEHDYIEVIECDNLPKRLSGLWLGDMILINRNLPITSKLETLAEELAHN 
ELTYGNIVDQSSFNHRKFEGYARRLAYEKLIPLKDIVKAFLQGIHDLYELANFFX 

20 Sequence 2203 

Contig_0724_pos_3437_2028, 

is similar to (with p-value O.Oe+00) 

>gp:gp|M57 689!BACSPO0K_5 Bacillus subtilis spoOK operon. NID 
: gl43602. 

25 atgcaagatttacaagtatttaattttgaagatttaccagtaagaaaaatagaagtagat 
ggagaaccatattttttaggtaaagacgtggcagaaatattaggttacacaagatctgat 
aatgcaattagaaatcatgttgatgatgaagataagctgacgcaccaagttagtgcatca 
ggtcaaaaacgaaacatggtaatcatcaacgaatctggtttatacagcttaatctttgac 
gctgctaaacaaagtaaaaacgaaagtattagaaagaaagctaaacgttttaaacgtccc 

30 aaaaatgcgaaaataaaggaggtaaaagtaatgacagaaacggtattagaagtaaatgat 
ttgcacgtttcctttgatattgctgcaggagaagtgcaagctgtcagaggcgtggatttt 
catttaaataaaggggaaacgttagccattgttggagaatctggatctggaaaatctgta 
acaactaaggcaattacaaaactttttcaaaaggatacaggaagaataaaaaagggagaa 
atf'ttatttttaggtgaggacttagctcagaaaagtgaaaaagaactgatacagctaaga 

35 ggtcgauatatttcaatgatatttcaggatcctatgacttctttaaatccaacaatgcaa 
atcggaaagcaagtcatggaacctttgattaaacataagaaattaagtaaagcaaaggcc 
aagcaaagagcattggaaattttgaatttagttgggttacctcgtgctgaaaaacgattt 
aaagcttatccacatcaattttcaggaggacaacgtcagagaatagttat tgcaatagca 
ttggcatgtgagcctaaaatattaattgctgatgagcctacaactgctttagatgtgaca 

40 atgcaggctcaaattttagatcttatgaaagaactacaaaataagattgaaacttcaatt 
atctttattacgcatgatttaggcgttgtagcaaatattgcggacaaagtagccgtaatg 
tatgggggacagatggttgaaacaggggatgtgaatgaaatattttatgatcctaaacat 
ccctatacctggggattgctttcttcaatgcctgatttaacaaccagtaatgacacggac 
ttaattgcaattccaggtacaccaccagatttacttcatccaccaattggtgatgctttc 

45 gcacgtagaagtcgatatgctttagatattgattttaaagaagaaccaccttggttcaaa 
atttcacccacacattttgttaaatcttggttattagatgcaagagctccaaaagttacg 
ccaccttcaatggttcaaaaacgattaagaacaatgccaagtaattatgaacaaccacat 
agagtagagagggtggcttttaatgagtaa 

50 Sequence 2204 

MQDLQVFNFEDLPVRKIEVDGEPYFLGKDVAEILGYTRSDNAIRNHVDDEDKLTHQVSAS 
GQKRNMVIINESGLYSLIFDAAKQSKNESIRKKAKRFKRPKNAKIKEVKVMTETVLEVND 
LHVSFDIAAGEVQAVRGVDFHLNKGETLAIVGESGSGKSVTTKAITKLFQKDTGRIKKGE 
ILI'LGEDI AQKSEKELIQLRGRDISMIFQDPMTSLNPTMQIGKQVMEPLIKHKKLMCAKA 

55 KQRALEILNLVGLPRAEKRFKAYPHQFSGGQRQRIVIAIALACEPKILIADEPTTALDVT 
MQAQILDLMKELQNKIETSIIFITHDLGVVANIADKVAVMYGGQMVETGDVNEIFYDPKH 
PYTWGLLSSMPDLTTSNDTDLIAIPGTPPDLLHPPIGDAFARRSRYALDIDFKEEPPWFK 
ISPTHFVKSWLLDARAPKVTPPSMVQKRLRTMPSNYEQPHRVERVAFNE* 
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Sequence 2205 

Contig_0724_pos_1756_1097, 

is similar to (with p-value 3.0e-81) 

>pir :pir IE38447 IE38447 sporulation initiation protein spoOKE 
5 - Bacillus subtilis >gp: gp| M57689 I BACSPO0K_6 Bacillus subti 
lis spoOK operon. NID: gl43602. 

atgatarttcaagatccttatgcatctctaaatcctcggttaaaggtaatggatatcgta 
gctgaaggaatagatattcacaaacttgctagtagtcagcgtgatcgaaagaaacgtgta 
tacgaccttttagaaacagttggtttaggtaaagaacacgcgaatcgttacccacatgag 

10 ttttcaggcggacaaagacaacgtattggtatcgcacgtgcattagctgtagagccagaa 
tttattattgcagatgaaccgatatcagcattagatgtatcgattcaagctcaagtcgtt 
aatcttttattaaagctacaacgtgaacgtgatattactttattgtttattgctcatgat 
ttatcaatggtgaaatatatttccgatagaattgcagtgatgcacttcggtaaaattgta 
gaaattggaccggctgatgatatttataattatccattacatgattatactaagtcatta 

15 ttaagtgccattccacagcctgatcctgatgttgagagaaatcgtcaacgtgttttatat 
catgaagatgcaacgcttaatgaagaacgtcaattaaatgaaattagaccacaacattat 
gtattttctactcaaaacgaagcagttaaattgaaacaaaagtatggtttgtctgtttaa 

20 Sequence 2206 

MIFQDPYASLNPRLKVMDIVAEGIDIHKLASSQRDRKKRVYDLLETVGLGKEHANRYPHE 
FSGGQRQRIGIARALAVEPEFIIADEPISALDVSIQAQVVNLLLKLQRERDITLLFIAHD 
LSMVKYISDRIAVMHFGKIVEIGPADDIYNYPLHDYTKSLLSAIPQPDPDVERNRQRVLY 
HEDATLNEERQLNEIRPQHYVFSTQNEAVKLKQKYGLSV* 

25 

Sequence 2207 

Con t i g_C 724_pos_107 0_2 3 4 , 

is similar to (with p- value 2.0e-51) 

>pir :pir IA53310I A53310 pheromone cADl binding protein precur 
30 sor - Enterococcus faecalis plasmid pADl >gp: gp| L19532 I ADITR 
AC_2 Plasmid pADl (from Enterococcus faecalis strain: DS16) 
hemolysin bacteriocin (traC) gene, complete cds, traA and tr 
aB genes, 3' end. NID: g388267, 

atgaaaggttttaaagtcttaattattttattaagtgtatgcataattttatctgcttgt 
35 agtaataagcagagtttatattcagaccaggggcaagtttttaggaaggtaatcacacaa 
gatatgactacactagatacagctttaattacagatgctgtttctggtgatatagcagct 
caagcttttgaaggattatatactttaaataaagaagacaaagctgaaccagctattgct 
aaatcttttccaaagaaaagtaatggtggcaaaacacttacgattaatttaagaaaaaat 
gcaaaatggtccaatggagattcggtaactgcatatgacttcgtatatgcgtggagaaag 
40 gtagttaatcctaagacggct t ctgagtttgcatacataatgagcgatataaaaaatgca 
gatgaagttaatgcaggtaaaaaatcagtcaaggatttgggtatcaaggctataggtaaa 
tataaattacaagtagatttagaaagacctgtaccttatattaatgaactattagcactt 
aatacatttaatcctcaaaatgagaaagttgctaaaaagtttggagaacaatatggtaca 
actgctgaaaaagcagtgtacaatggaccatttgaagtaacaaattggaaagtggaagat 
45 aaaattcaattagttaaaaatgaacaatattgggataagaagaatgtaaaattagataaa 
gtgaactataaagtattaaaagatcaacaagcaggtgcatcgttatatgatactggctcg 
gtcgatgatactatgttaagtatactgcacaagcatctagtccatcagaaggtttag 

Sequence 2208 

50 MKGFKVLIILLSVCIILSACSNKQSLYSDQGQVFRKVITQDMTTLDTALITDAVSGDIAA 
QAFEGLYTLNKEDKAEPAIAKSFPKKSNGGKTLTINLRKNAKWSNGDSVTAYDFVYAWRK 
VVNPKTASEFAYIMSDIKNADEVNAGKKSVKDLGIKAIGKYKLQVDLERPVPYINELLAL 
NTFNPQNEKVAKKFGEQYGTTAEKAVYNGPFEVTNWKVEDKIQLVKNEQYWDKKNVKLDK 
VNYKVLKDQQAGASLYDTGSVDDTMLSILHKHLVHQKV* 

55 

Sequence 2209 

Contig_0727_pos_3292_0, 

putative peptide of unknown function 

atgtcttttcttaggaaacacaccgaaattatatttagttatatcatcggtatcgtttca 
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ctttttacaggtctcattatttttattaacttacctttaatcaaacaatttaaaggtgac 
aaaaaggttgatacgcatgtgcataacgtatgggaattcctaaatgccttttttgccgag 
attataaaagtgatgagtaaatttattggtggctttccaattacaagtgccatagtaatc 
atcgtatttggtattctagtgatgctgttaggtcacactttatttagaactattaaatac 
gattatgacatttcaattttctttttagttattggcattatgtactttatcattacatta 
ttgctaatgacacaagtgtatggcttttttgctatcgtctttattattccttttacagtt 
catattggttacatagtttataaagatgagttgaaccaagacaatcgaaagaaccattat 
atgtggattattgtaacttatggaatgagttatcttattacccaaatttcgctatatgga 
cgtattgacgcaaatgaaattgaatcaattgatattttaagtgtaaatacattcttcatt 
attatgtggttattaggtcagatggctata 

Sequence 2210 

MSFLRKHTEIIFSYIIGIVSLFTGLIIFINLPLIKQFKGDKKVDTHVHNVWEFLNAFFAE 
IIECVMSKFIGGFPITSAIVIIVFGILVMLLGHTLFRTIKYDYDISIFFLVIGIMYFIITL 
LLMTQVYGFFAIVFIIPFTVHIGYIVYKDELNQDNRKNHYMWIIVTYGMSYLITQISLYG 
RIDANEIESIDILSVNTFFIIMWLLGQMAI 

Sequence 2211 

Contig_0727_pos_0_369, 

putative peptide of unknown function 

atgtttccaccatattttatcacgaacagggagtgtaagagcatgacgcgacagagaatc 
gccattgatatggatgaagtgcttgctgatacattgggtgctgttgttaaagcggtcaat 
gaacgagcggatttaaatatcaaaatggaatcattaaacggtaaaaaattaaaacatatg 
atacccgagcatgaggggttagtcatggatattttaaaagaacctggattctttagaaat 
ttagatgtaatgccgcatgctcaagaagttgtaaaacaactcaatgagcattacgacata 
tacatagccacagcagcgatggatgttccaacctcttttcatgacaaatatgaatggtta 
ttCAAATGA 

Sequence 2212 

MFPPYFITNRECKSMTRQRIAIDMDEVLADTLGAVVKAVNERADLNIKMESLNGKKLKHM 
I PEHEGLVMDI LKE PGFFRNLDVMPHAQEVVKQLNEH YDI YI ATAAMDVPTS FHDKYEWL 
FK* 

Sequence 2213 

Contig_0728_pos_6214_657 6, 

is similar to (with p-value 4.0e-29) 

>gt':Gpl-^J000339 i LDGAPPGK_3 Lactobacillus delbrueckii ygaP, g 

ap, pgk, tpi, and ycsE genes. NID: g2624189, 

atgactttctattgttttattcccacatcaaagtatttaaaaacttatttatcattaata 

gcttttataccaggtaattctttaccttctaagtattctaatgatgctcctccaccagta 

gagatgtgtgtaaagtcatcttcgaaacctaatgaaattgctgctgcggcagagt caeca 

ccaccaataatagtagtagcgtcttccaatttagcaatagactcacatacaccgattgta 

cctttagcaaaattactaaattcgaatacacccataggtccattccatactacagtatgt 

gcaccttgtaattctttattaaataattctactgttttaggtccaatatccattgcttct 

tga 

Sequence 2214 

MTFYCFIPTSKYLKTYLSLIAFIPGNSLPSKYSNDAPPPVEMCVKSSSKPNEIAAAAESP 
PPIIWASSNLAIDSHTPIVPLAKLLNSNTPIGPFHTTVCAPCNSLLNNSTVLGPISIAS 



Sequence 2215 

Contig_0728_pos_6923_6258, 

is similar to (with p-value 2.0e-69) 

>gp:gp|AJ000339|LDGAPPGK_3 Lactobacillus delbrueckii ygaP, g 
ap, pgk, tpi, and ycsE genes. NID: g2624189. 

atL,g ^ jric.agaaattaaat ttattggtggcgtagtgaatgatccacaaaaaccag^.agtt 
gctattttaggtggcgctaaagtttcagataaaattaatgttatcaaaaatttagutaat 
atcgcagataaaatcttaatcggtggcggtatggcttatacatttattaaagcgcaaggt 
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aaggaaataggtctttcattattggaagaagacaaaattgattttgctaaagacttgtta 
gagaataatggcgatcaaatagtattacctgtagattgtaaaatcgctaaagaattttct 
aatgatgcaaaaatcactgaagtatctatcaatgaaatcccttcagatcaagaagcaatg 
gatattggacctaaaacagtagaattatttaataaagaattacaaggtgcacatactgta 
5 gtatggaatggacctatgggtgtattcgaatttagtaattttgctaaaggtacaatcggt 
gtatgtgagtctattgctaaattggaagacgctactactattattggtggtggtgactct 
gccgcagcagcaatttcattaggtttcgaagatgactttacacacatctctactggtgga 
ggagcatcattagaatacttagaaggtaaagaattacctggtataaaagctattaatgat 
aao t ,1 a 

10 

Sequence 2216 

MEKEIKFIGGVVNDPQKPWAILGGAKVSDKINVIKNLVNIADKILIGGGMAYTFIKAQG 
KEIGLSLLEEDKIDFAKDLLENNGDQIVLPVDCKIAKEFSNDAKITEVSINEIPSDQEAM 
DIGPKTVELFNKELQGAHTVVWNGPMGVFEFSNFAKGTIGVCESIAKLEDATTIIGGGDS 
15 AAAAISLGFEDDFTHISTGGGASLEYLEGKELPGIKAINDK* 

Sequence 2217 

Contig_0728_pos_6127_5366, 

is similar to (with p-value 2.0e-90) 

20 >sp:sp|P35144 |TPIS_BACME TRIOSEPHOSPHATE ISOMERASE (EC 5.3.1 
.1) (TIM). >pir:pir| JQ1955I JQ1955 triose-phosphate isomerase 
(EC 5.3.1.1) ~ Bacillus megaterium >gp:gp I M87 647 | BACPGKTIMG 
Bacillus megaterium giyceraldehyde-3-phosphate dehydrogen 
ase (gap) , phcsphoglycerate kinase (pgk) , and triose phospha 

25 te isomerase (tpi) genes, complete cds . NID: gl43315. >gp:gp 
I M87648 I BACTPIPGK_2 Bacillus megaterium triose phosphate iso 
merase (tpi) gene, complete cds, and phosphoglycerate kinase 

(pgk) gene, 3' end. NID: gl43759. 
atgaqaaoaccaattatagccggaaactggaaaatgaataaaacagttcaagaag': taaa 

30 gactttgtaaacgaattaccaacattacctgatcctaaagaagtagaatcagttatttgt 
gcaccaacaatccaattagacgctttagtaacagctgttaaagatggtaaagcaaaaggg 
ttaaaaattggagcacaaaacgcttactttgaagaaagcggtgcttatactggagaaact 
tcaccagtagcattatctgaattaggtgttaaatatgtagtgattggtcactcagagcgt 
cgtgactatttccacgaaactgacgaagaagtaaacaaaaaagcgcatgctatcttcaat 

35 cacggtatgacacctattatttgtgtaggtgaatctgatgaagaacgtgaagctggtaaa 
gcaaataaaatcgtaggtaatcaagtgaaaaaagctgtcgaaggtttatcagatgatcaa 
cttaaagaagttgttattgcatatgaaccaatttgggctatcggtactggtaagtcatct 
acatctgaagatgcaaatgaaatgtgtgctcacgtacgtcaaacattagctgacttatct 
agtcaagaggttgctgacgctacacgtattcaatatggtggtagtgttaaacctaataac 

40 attaaagaatatatggctcaatcagatatcgatggcgctcttgtaggtggcgcatcatta 
aaagttgaagatttcgtacaattgttagaaggtgcaaaataa 

Sequence 2218 

MRTPIIAGNWKMNKTVQEAKDFVNELPTLPDPKEVESVICAPTIQLDALVTAVKDGKAKG 
45 LKIGAQNAYFEESGAYTGETSPVALSELGVKYVVIGHSERRDYFHETDEEVNKKAHAIFN 
HGMTPIICVGESDEEREAGKANKIVGNQVKKAVEGLSDDQLKEVVIAYEPIWAIGTGKSS 
TSEDANEMCAHVRQTLADLSSQEVADATRIQYGGSVKPNNIKEYMAQSDIDGALVGGASL 
KVFJOFVQLLEGAK* 

50 Sequence 2219 

Contig_0728_pos_5363384 6, 

is similar to (with p-value O.Oe+00) 

>sp:splP39773| PMGIBACSU 2, 3-BISPHOSPHOGLYCERATE-INDEPENDENT 
PHOSPHOGLYCERATE MUTASE (EC 5.4.2.1) (PHOSPHOGLYCEROMUTASE) 
55 (BPG-INDEPENDENT PGAM) (VEGETATIVE PROTEIN 107) {VEG107). 

atggcaaaacaaccaactgccttaatcatcttagatggtttcgcaaatcgtgaaagtgaa 
catggcaatgcagttaagcaagcacataaacctaattttgatcgatattatgaaaaatat 
cctacaacacaaatagaagctagtggcttagatgtaggtcttcctgaaggtcaaatgggt 
aactctgaagtaggacatatgaatattggtgcaggacgcatcgtatatcaaagtttaact 
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cgtattc^ataaatcgattgaagacggagaattctttgataacactgtattaaata.: cgct 
gttaaacatgttaaagacaatggctctgcgcttcatgtattcggattgctttctgatggt 
ggtgtacacagtcattataagcatctatttgctattttagaattagctaaaaagcaagga 
atagataaagtatatgtccacgcatttttagatggtcgtgatgttgatcaaaaatctgct 
5 ttgaaatatatagaggaaactgaagataaatttaaagaattaggtgtaggccaattcgct 
tctgtttcaggacgttattatgctatggaccgtgacaagcgttgggatcgtgaggaacgt 
gcctataatgctattcgtaactttgaaggtcctacatttacttcagctaaagcaggcgtt 
gaagctaattataaaaatgatgtgactgatgaattcgtcgaaccgtttatagttgaaggc 
caaaacgatggtgtgaacgatggagacgcagtaatcttttataatttccgtccagataga 

10 gcagctcaactttcagaaatctttactaataaagcgtttgatggatttaaagttgaacaa 
gtggacaacttattctacgctacattcacgaaatataatgacaatgtagatgctgaaatt 
gtatttgaaaaagttgacttaaataatacaatcggtgaagttgctcaagataatggcttg 
aaacaattacgtatcgctgaaactgaaaagtatccacatgtaacatactttatgagtggt 
ggacgaaatgaagagtttgaaggagaacgtcgtagactcatcgattctccaaaagtagcg 

15 acttatgatttaaaacctgagatgagtgcatatgaagttaaagatgcattattagaagag 
ttagacaaaggtgacttagatttaattctactgaactttgctaacccagatatggttgga 
catagtggtatgcttgaaccaacaattaaagcaatcgaagcagtagatgagtgtcttggt 
gaagtcgttgacaaaattattgatatgggtggtcatgccatcatcactgcagaccacggt 
aactcagatcaagtattaactgatgacgaccaacctatgacgacacacacaactaatcct 

20 gttccagttattgtaactaaagaaggtgttacattaagagaaactggacgtttagqcgat 
ttagcgccgacattattagatttattaaatgttaaacaaccatctgaaatgacaggtgaa 
tcactgattaaacattag 

Sequence 2220 

25 MAKQPTALIILDGFANRESEHGNAVKQAHKPNFDRYYEKYPTTQIEASGLDVGLPEGQMG 
NSEVGHMNIGAGRIVYQSLTRINKSIEDGEFFDNTVLNNAVKHVKDNGSALHVFGL.LSDG 
GVHSHYKHLFAILELAKKQGIDKVYVHAFLDGRDVDQKSALKYIEETEDKFKELGVGQFA 
SVSGRYYAMDRDKRWDREERAYNAIRNFEGPTFTSAKAGVEANYKNDVTDEFVEPFIVEG 
QNDGVNDGDAVIFYNFRPDRAAQLSEIFTNKAFDGFKVEQVDNLFYATFTKYNDNVDAEI 

30 VFEKVDLNNTIGEVAQDNGLKQLRIAETEKYPHVTYFMSGGRNEEFEGERRRLIDSPKVA 
TYDLKPEMSAYEVKDALLEELDKGDLDLILLNFANPDMVGHSGMLEPTIKAIEAVDECLG 
EVVDKIIDMGGHAIITADHGNSDQVLTDDDQPMTTHTTNPVPVIVTKEGVTLRETGRLGD 
LAPTLLDLLNVKQPSEMTGESLIKH* 

35 Sequence 2221 

Con t i g_0 7 2 8_pos_3 7 0 7_2 4 0 3 , 

is similar to (with p-value O.Oe+00) 

>gp:gp| AF065394 I AF065394_1 Staphylococcus aureus enolase (en 
o) gene, complete cds . NID: g3152724. 

40 atg Jcar.i: tat tacagatgtt tacgctcgcgaagtcttagactcacgtggtaacc<;aaca 
gttgaagttgaagtattaactgaaagtggtgctttcggacgtgcattagtaccttctggt 
gcttctactggtgaacatgaagcagttgaattacgtgatggagataaatcacgttattta 
ggtaaaggtgtgactaaagcggtagaaaatgttaacgaaatgatcgcaccagaaatcgtt 
gaaggtgaattttcagttttagatcaagtatctattgataaaatgatgattcaattagac 

45 ggtacacacaacaaaggtaaattaggtgcaaatgccattttaggtgtttctattgccgta 
gctcgtgcagctgctgacttattaggtcaaccattatataaatatttaggtggatttaat 
ggtaaacaattgccagtacctatgatgaatattgttaatggtggttctcactcagatgca 
ccaattgctttccaagagttcatgattttacctgtaggtgctgagtcattcaaagaatca 
ttacgttggggtgcagaaatcttccataaccttaaatcaatcttaagtgaacgcggttta 

50 gaaactgcagtaggtgatgaaggtggtttcgcacctagatttgaaggcactgaagacgct 
gtagaaactattattaaagctatcgaaaaagcaggatacaaaccaggtgaagatgtattc 
ttaggatttgactgtgcttcttctgaattctatgaaaatggtgtttatgattacactaaa 
ttcgaaggtgaacacggtgctaaacgtagtgcagcagagcaagttgactacttagaagaa 
ttaattggtaaatatccaatcatcactattgaagatggtatggatgaaaacgattgggaa 

55 ggttggaaacaattaactgatcgtatcggtgataaagttcaattagttggtgatgattta 
ttcgtaactaacactgaaattttatctaaaggtatcgaacaaggtattggtaactcaatc 
ttaatcaaagtaaaccaaatcggtacattaactgaaacattcgatgctattgaaatggct 
caaaaagctggatatactgcggttgtatctcaccgttctggtgaaactgaagatactaca 
attgctgatatcgcagttgctacaaatgcaggccaaattaaaacaggttcattatctaga 
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actgaccgtattgctaaatacaatcaattattacgtattgaagatgaattatacgaaaca 
gctaaatttgaaggaattaaatctttctacaatttagataaataa 

Sequence 2222 

5 MPIITDVYAREVLDSRGNPTVEVEVLTESGAFGRALVPSGASTGEHEAVELRDGDKSRYL 

GKGVTKAVENVNEMIAPEIVEGEFSVLDQVSIDKMMIQLDGTHNKGKLGANAILGVSIAV 
ARAAADLLGQPLYKYLGGFNGKQLPVPMMNIVNGGSHSDAPIAFQEFMILPVGAESFKES 
LRWGAEIFHNLKSILSERGLETAVGDEGGFAPRFEGTEDAVETIIKAIEKAGYKPGEDVF 
LGFDCASSEFYENGVYDYTKFEGEHGAKRSAAEQVDYLEELIGKYPIITIEDGMDENDWE 
10 GWKQLTDRIGDKVQLVGDDLFVTNTEILSKGIEQGIGNSILIKVNQIGTLTETFDAIEMA 
QKAGYTAVVSHRSGETEDTTIADIAVATNAGQIKTGSLSRTDRIAKYNQLLRIEDELYET 
AKFEGIKSFYNLDK* 

Sequence 2223 
15 Contig_0728_pos__2237_177 9, 

putative peptide of unknown function 

atgacagactcaaatgctaaagaaataagaactggacgtttaattgcgataagttcatta 
gtgttttgtattttacttatcatacaccactttattgtattagatgaatcaacagctaaa 
tcaattttatctttagctggtcaaaaaacatcagatacagcagtgaaaaacattttaaat 
20 agtgaccgatacactggaattatgtatattttagcttacttagcaggtactgttgctttc 
tggaatcgccatccatatttatggtggtttatgtttgccgtatatatttctaatgcacta 
tttacactcgtaaatctttacttatttattcaaggtattttagatgtaaaaaatgtactt 
gcagttttaccaattttaattgtagtgattggatctataattctagcaatttatatgcta 
gttgtttctattacacgtaaaagtactttcaatagatag 

25 

Sequence 2224 

MTDSNAKEIRTGRLIAISSLVFCILLIIHHFIVLDESTAKSILSLAGQKTSDTAVKNILN 
SDRYTGIMYILAYLAGTVAFWNRHPYLWWFMFAVYISNALFTLVNLYLFIQGILDVKNVL 
AVLPILIVVIGSIILAIYMLVVSITRKSTFNR* 

30 

Sequence 2225 

Contig_0728_pos_1710_1147, 

is similar to (with p-value B.Oe-eO) 

>pir :pir IA40585I A40585 DNA topoisomerase ( ATP-hydrolyzing) ( 
35 EC 5.99.1.3) chain B ~ Staphylococcus aureus >gp: gp 1X714 37 | S 
AGYRREC_2 S. aureus genes gyrB, gyrA and recF (partial). NID: 
g296393. >gp : gp | D104 8 9 I STAGYRABA_1 Staphylococcus aureus ge 
nes for DNA gyrase A and B, complete cds. NID; g540540. 
atgcatacattaatcatcgttttattaattatagattgtattgcattagtgactgttgta 
40 ttactccaagaaggtaaaagtaatggactttcaggtgctattagtggtggcgctgaacaa 
ttgtttggtaaacaaaaacaacgtggcgtcgatttattcttgcatagattaacaatacgt 
acgttattacttacattcttctatcgtttcatgagacctttaattgaagcgggctacgtt 
tatattgctcagccgcctttatataaactaacacaaggaaaacaaaaatattatgtattt 
aacgatagagaactagacaagttgaaacaagaattaaacccgtcaccaaaatggtcaatt 
45 gcacgttacaaaggtcttggtgaaatgaacgcagaccaattatgggaaacgactatgaat 
cctgaacatcgctctatgttgcaagtgagacttgaagatgcaattgatgcagaccaaaca 
tttgaaatgttaatgggcgatgtagtagaaaatcgcagacaatttatcgaagacaatgca 
gtttatgccaacctagatttctag 

50 Sequence 2226 

MH'"':,TIV!.LIIDCIALVTVVLLQEGKSNGLSGAISGGAEQLFGKQKQRGVDLFLHr-uTIR 
TLLL^iFFYRFMRPLIEAGYVYIAQPPLYKLTQGKQKYYVFNDRELDKLKQELNPSr KWSI 
ARYKGLGEMNADQLWETTMNPEHRSMLQVRLEDAIDADQTFEMLMGDVVENRRQFIEDNA 
VYANLDF* 

55 

Sequence 2227 

Con t ig_0 7 2 8_pos_0_l 110, 

is similar to (with p-value O.Oe+OO) 

>sp:sp| P20831|GYRA_STAAU DNA GYRASE SUBUNIT A (EC 5.99.1.3). 
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atggctgaattacctcaatcaagaattaatgaacgaaatataaccagtgaaatgcgtgaa 
tcattcttagactatgctatgagtgttatcgtttctcgtgcattacctgatgttagagac 
gg;;.ttaaagccagtacatcgtcgtattctttatggtttaaatgaacaaggtatgar;:jccc 
gataaaccttataagaaatctgcacgtatagtcggggatgtcatgggtaaatatc.-.ccct 
5 catggtgattcttcaatttatgaagcaatggtaagaatggcccaagactttagttatcgt 
tatccacttgtagatggtcaaggtaactttggctctatggatggtgacggtgcagctgca 
atgcgttataccgaagcacgtatgactaaaataacattagaacttttacgtgatattaac 
aaagacacaattgattttattgacaactatgatggtaatgaaagagagccgtcagtctta 
cctgcacgtttccctaacttactagtaaatggtgcggcaggaattgccgtaggtatggct 

10 acaaatattcctccccacaatttaactgaagttattgatggtgtgctcagtttaagtaag 
aatccagacatcacaattaatgagctgatggaagacatacaaggtcctgattttcctaca 
gctggtttagtactagggaaaagtggtattcgtcgagcttatgaaacaggtcgtgggtca 
attcaaatgcgttctcgtgctgaaatagaagaacgtggtggtggccgtcaacgtattgtc 
gtaacggaaatacctttccaagtcaataaagcgcgtatgattgaaaaaatcgcagagtta 

15 gttagagataagaaaatcgacggtattacagatttacgtgatgaaacaagtttgcgtaca 
ggtgtaagagtagttattgatgtacgtaaagatgcaaatgcgagtgttattttaaataat 
ttatataaacaaacgccattacaaacatcatttggtgtgaatatgattgctttagtgaat 
ggtagacctaaactaatcaatttaaaagaagcacttatccattacttagaacaccaaaaa 
acagtggttagacgacgtactgaatataat 

20 

Sequence 2228 

MAELPQSRINERNITSEMRESFLDYAMSVIVSRALPDVRDGLKPVHRRILYGLNEQGMTP 
DKPYKKSARIVGDVMGKYHPHGDSSIYEAMVRtylAQDFSYRYPLVDGQGNFGSMDGDGAAA 
MRYTEAFJyiTKITLELLRDINKDTIDFIDNYDGNEREPSVLPARFPNLLVNGAAGIAVGMA 
25 TNIPPHNLTEVIDGVLSLSKNPDITINELMEDIQGPDFPTAGLVLGKSGIRRAYETGRGS 
IQMRSRAErEERGGGRQRIWTEIPFQVNKARMIEKIAELVRDKKIDGITDLRDETSLRT 
GVRVVIDVRKDANASVILNNLYKQTPLQTSFGVNMIALVNGRPKLINLKEALIHYLEHQK 
TVVRRRTEYN 

30 Sequence 2229 

Contig_07 30_pos_67 51 0 , 

putative peptide of unknown function 

atgcttaattctacaatcaaatatgacactaagaagaaaaaactacctaaatttagtaaa 
ggtactaaaaagaaagacggtatattagatgttattagctctggtgtaaaaaatgatgtt 

35 aataaagtaaaagacattggtggtaaagcaagagacataggtggtactacgtttgacaaa 
gcaaaagacataggtacaaaagcacttgataaagctaaagatgtgtctagcactgttatc 
aagggtattggagatgtttttgattatgtaggtcatcctatgaaattggtaaataaagtc 
tttgagaaagttggttttaacctagactttatgaaaaatgcaccattaccatttgattta 
atgacagctatgattaagaaacttaaaaatggtattaaagacttctttaatgaaggttta 

40 gactctgcaggcggtggagatggttcttcgttcactaaattcccaattactacggggtat 
tatcctaatggtggtgctcctggttatagttttaatggtggtgctcactttggtattgac 
tatggcgctccatatggtacaactatcaat 

Sequence 2230 

45 MLNSTIKYDTKKKKLPKFSKGTKKKDGILDVISSGVKNDVNKVKDIGGKARDIGGTTFDK 
AKDIGTKALDKAKDVSSTVIKGIGDVFDYVGHPMKLVNKVFEKVGFNLDFMKNAPLPFDL 
MTAMIKKLKNGIKDFFNEGLDSAGGGDGSSFTKFPITTGYYPNGGAPGYSFNGGAHFGID 
YGAPYGTTIN 

50 Sequence 2231 

Contig_0730_pos_6983_6669, 

putative peptide of unknown function 

gtgctagacacatctttagctttatcaagtgcttttgtacctatgtcttttgctttgtca 
aacgtagtaccacctatgtctcttgctttaccaccaatgtcttttactttattaacatca 
55 ttttttacaccagagctaataacatctaatataccgtctttctttttagtacctttacta 
aatttaggtagttttttcttcttagtgtcatatttgattgtagaattaagcattatcaag 
gtcgcattcaatcaggtgagaagattaaatatactaaataatatattaggtgattcgtta 
tgcaaatattattag 
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Sequence 2232 

VLDTSLALSSAFVPMSFALSNWPPMSLALPPMSFTLLTSFFTPELITSNIPSFFLVPLL 
NLGSFFFLVSYLIVELSIIKVAFNQVRRLNILNNILGDSLCKYY* 

5 Sequence 2233 

Con t ig_0 7 3 0_pos_6 4 3 5_60 1 0 , 

is similar to (with p-value 3.0e-21) 

>sp:sp| P4 2421 I YXDJ__BACSU HYPOTHETICAL 26.6 KD SENSORY TRANSD 
UCTION PROTEIN IN IDH 3'REGION. >gp : gp I D14 399 1 BACI0L0_11 Bac 
10 illus subtilis 15 kb chromosome segment contains the iol ope 
ron. NID: g709980. >gp: gp I Z99124 I BSUB0021_70 Bacillus subtil 
is complete genome (section 21 of 21) : from 3999281 to 42148 
14. NID: g2636442. 

atcgatcaagtgatgagtatggaacttggtgcagatgattatatgcaaaaaccattttat 
15 acaae.cqtcttaattgctaagctacaagctatttatagacgcgtttatgaatttgcagtt 
gaagaaaagagaacgttaagttggcaagacgctactgtggatttatcaaaagatagtatt 
caaaaagatgataaaactatctttttgtctaaaacagagatgattattttagagatgtta 
atcaataaacgtaatcaaatcgtgacacgagacactctcattactgctttgtgggatgat 
gaagcttttgttagtgataatactttaacagttaatgttaatagattaagaaaaaaatta 
20 tcagaaattgacatggatagtgcaattgaaaccaaagttggtaaaggatacttagctcat 
gaataa 

Sequence 2234 

MDQVMSMELGADDYMQKPFYTNVLIAKLQAIYRRVYEFGVEEKRTLSWQDATVDLSKDSI 
25 QKDDKTIFLSKTEMIILEMLINKRNQIVTRDTLITALWDDEAFVSDNTLTVNVNRLRKKL 
SEI DMDSAIETKVGKGYLAHE* 

Sequence 2235 

Con t i g_0 7 3 0_pos_5 60 6_4 977, 

30 putative peptide of unknown function 

atgaaattattgatagatcaagagaatgatgatcagcgtaagcgagcgttattatttgaa 
tggtctcgtattaatgagatgttagataagcaattatatttaacaaggcttgaaacacat 
catcgtgatatgtattttgattatatttcattaaagagaatggttatagatgaaatacaa 
gttactcgacatatcagtcaggcaaaagggataggttttgaattagattttaaagctcgaa 

35 caaaaggtttatacagatgttaaatggtgccgtatgatgattaggcaagttctatctaac 
tctttgaaatatagtgataattctacaataaatttaagtggttataacatagaaggacac 
gttgttttaaaaattaaagactacggtcgtggaattagtaaaagagatttaccacgtata 
tttgatagaggatttacttctacaacagaccgcaacgatactgcgtcttctggtatggga 
ttataccttgtacaaagcgtgaaagaacaacttgggattgaagttaaagttgattcaata 

40 gtggggaaaggaacaacgttttatttcattttcccacaacaaaatgaaatcattgagcgc 
atgtctaaagtgacakgattgtcattttaa 

Sequence 2236 

MKLLIDQENDDQRKRALLFEWSRINEMLDKQLYLTRLETHHRDMYFDYISLKRMVIDEIQ 
45 VTRHISQAKGIGFELDFKDEQKVYTDVKWCRMMIRQVLSNSLKYSDNSTINLSGYNIEGH 
WLKIKDYGRGISKRDLPRIFDRGFTSTTDRNDTASSGMGLYLVQSVKEQLGIEVKVDSI 
VGKGTTFYFIFPQQNEIIERMSKVTRLSF* 

Sequence 2237 
50 Cont ig_0730_pos_4 5 1 3_4 07 0 , 

is similar to (with p-value 8.0e-32) 

>sp:sp|P42423I YXDL_BACSU HYPOTHETICAL ABC TRANSPORTER ATP-BI 
NDING PROTEIN IN IDH 3* REGION. >gp : gp ( D14 399 | BACI0L0_13 Baci 
llu.5 oubtilis 15 kb chromosome segment contains the iol oper 
55 on. NID: g709980. >gp: gpl Z99124 I BSUB0021__68 Bacillus subtili 
s complete genome (section 21 of 21): from 3999281 to 421481 
4. NID: g2636442. >gp : gp | D4 5912 | D4 5912_2 Bacillus subtilis g 
enome sequence between the iol and hut operon, partial and c 
omplete cds. NID: gl408482. 
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atgcttccactatcagttcaaaagttagataaacaaataatgcatgaacgttatcaacgt 
atagtagaagctttgaacattagtgatattagtgataaatatccatcagagttgtcaggt 
ggacagcgtcaacgtacctcagcagcaagggcatttattaatttaccttcaattatattt 
get gatgagcctacaggtgctttagattctaaaagtacactagatt tact taagcgtctt 
5 aaatatatgaatgaggaatttaacacaaccatacttatggtgacacatgatccagtagcg 
gctagtttttcaaaccgtgttgtgatgttaaaggatggacaaatctttactgagttgtat 
caaggtgatgatgataaacaaacgttttataaagaaattataagaacgcaaagtgtactt 
ggtggcatcaattatgagctttaa 

10 Sequence 2238 

MLPLSVQKLDKQIMHERYQRIVEALNISDISDKYPSELSGGQRQRTSAARAFINLPSIIF 
ADEPTGALDSKSTLDLLKRLKYMNEEFNTTILMVTHDPVAASFSNRVVMLKDGQIFTELY 
QG D DDKQT F Y KE 1 1 RTQS VLGG I N Y EL * 

15 Sequence 2239 

Contig_0730_pos_387 6_34 57, 

putative peptide of unknown function 

gtggtctttttgctttatgctaattttttatttttgaaaagacgaggtcgtgaactatcg 
ttattacaaattattggtctaacaaagaaagatatcatgaaaatgattatgttggagcaa 
20 ttgatgacatttatgatgacaactattgtaggtatcatattgggaatctttggttcgaaa 
attttactcatgattgtattgcgattattaggaatcaacgtgagtgtttctattatattt 
aattatcatgccattttagaaacgttattattaatagctgtgtcatatgtacttatagtc 
tttcaaagctatgtatatttacttaaacgttctattaaagagttagcgtctgatgtaaat 
aaaaaagagttcagtcatacacgcacaacacttggtgaagttgtattaggtttcttataa 

25 

Sequence 224 0 

VVFLLYANFLFLKRRGRELSLLQIIGLTKKDIMKMIMLEQLMTFMMTTIVGIILGIFGSK 
I LLMI VLRLLG I NVS VS 1 1 FN YH AI LETLLLI AVS YVLI VFQS YV YLLKRS I KELAS DVN 
30 KKEFSHTRTTLGEVVLGFL* 

Sequence 2241 

Contig_0730_pos_3418_2012, 

is similar to (with p-value 2.0e-78) 

35 >sp:splQ02001|TRPE_LACLA ANTHRANILATE SYNTHASE COMPONENT I ( 
EC 4.1.3.27). >pir:pir IS35124 IS35124 anthranilate synthase ( 
EC 4.1.3.27) alpha chain - Lactococcus lactis subsp. lactis 
>gp:gplM874 83iLACTRPOP_2 L. lactis trpE, trpG, trpD, trpF, t 
rpC, trpB trpA genes, complete cds. NID: gl49514. 

40 atggatattgtatacaaaaaggtgaatgctcaaattacgccagaagctttagcaaaatta 
aaacaaaaaaagatcatttttgaaagtacaaatcaacagaaacttaaaggtaggtactcg 
atagtagtattcgatcattatggcaaaattacattagataattctcaacttttaattaag 
ttagacaatcattgtgaaatagttaagaatcaaccgtatcaacgacttaaggaatttgta 
gataaatattattttgaaatcaaagataaatatttaaaagatttaccttttatttcgggc 

45 tttatagggacatgtagctttgatttagtacgacatgaatttaaaaaattacaagatatt 
aaattagaagatcatcaaactcatgatgtccaattttatctagtggaagatgtatttgtt 
tttgatcattataaagatgaattatatattatcgcaagtaacttattttcttatagaaca 
aaagagagattaaaggaatctattgaacgtaaaattgaagatttaaaaaacatacatttt 
tcggttgaggatataaattataaatccatccctcgacatataaccaccaatatatcagag 

50 caacaatttgttcaaactattagaattttaaaaaagaaaattactgaaggagatatgttt 
caagtagttccttcaagaatttatagttataaacaccattttcaacacaatttacatcaa 
ttaacttttcagttatatcaaaatttaaagcgacaaaatcctagtccatatatgtattat 
attaataaagatgtaccgattgtaataggaagttctcctgaaagttttgtaaaggtaaaa 
gatggaaaagtttatacgaatcctatagctggaacaattaaaagaggtcaaaataaaaaa 

55 gaagatgaaaataatgaaaagacattaatgaaagatgaaaaggaattgagtgaacatcgt 
atgctcgtagatttaggaagaaatgatattcatcgaataagtaaaacaggcacttcacaa 
attaccaaactaatgacaatagaacgttatgaacatgtcatgcatatcgttagtgaagtt 
attggagaattaaaaccccatctatctcctatgagcgtcatcgcaagtt tgctaccaacg 
ggtactgtctcaggtgcacctaaacttagagctatacagagaatatacgaatcttatcct 
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tataaaagaggtatctatagcggtggtgttgggtatatcaactgtaatcatcatttagat 
tttgcattggctatacgtaccatgattatcgatgaggaaaaagtcagtgtcgaggcagga 
tgtggagtagtatatgattctattccagagaaagaacttgaagaaacaaaacttaaagct 
aaaagtttattggaggtaactccatga 

5 

Sequence 2242 

MDIVYKKVNAQITPEALAKLKQKKIIFESTNQQKLKGRYSIWFDHYGKITLDNSQLLIK 
LDNHCEIVKNQPYQRLKEFVDKYYFEIKDKYLKDLPFISGFIGTCSFDLVRHEFKKLQDI 
KLEDHQTHDVQFYLVEDVFVFDHYKDELYriASNLFSYRTKERLKESIERKIEDLKNIHF 
10 SVEDINYKSIPRHITTNISEQQFVQTIRILKKKITEGDMFQVVPSRIYSYKHHFQHNLHQ 
LTFQLYQNLKRQNPSPYMYYINKDVPIVIGSSPESFVKVKDGKVYTNPIAGTIKRGQNKK 
EDENNEKTLMKDEKELSEHRMLVDLGRNDIHRISKTGTSQITKLMTIERYEHVMHIVSEV 
IGELKPHLSPMSVIASLLPTGTVSGAPKLRAIQRIYESYPYKRGIYSGGVGYINCNHHLD 
FALAIRTMI I DEEKVSVEAGCG WYDS I PEKELEETKLKAKSLLEVTP* 

15 

Sequence 224 3 

Cor.t i.g_ 0730_pos_l 4 4 6_4 51 , 

is similar to (with p-value 8.0e-5i) 

>sp:sp| P17170|TRPD_LACCA ANTHRANILATE PHOSPHORIBOSYLTRANSFER 

20 ASE (EC 2.4.2.18), >pir : pir | S4234 3 IJS0340 anthranilate phosp 
horibosyltransf erase (EC 2.4.2.18) - Lactobacillus casei >gp 
:gp|D004 96!LBATRP_2 Lactobacillus casei DNA, trp operon (trp 
D, trpC, trpF, trpB, trpA) , complete cds . NID: g216754. 
atgacccttcttgagaaaattaaacaaaataaatctttatctaaaaaagatatgcaatca 

25 tttattgttacactgtttgattcaaatatagaaaccaatgtaaaggttgaattattgaaa 
gcttatacaaataaagacatgggtcaatatgagctaacgtatttagttgaatattttatc 
cagacaaactatccaaaccaaccattttataataaagctatgtgtgtttgtggcacaggt 
ggagatcaatcaaatagctttaatatttctacaactgtagcttttgttgtagcaagtgca 
ggagtgccagtcattaaacacggtaataaaagtattacttcacattcaggaagtacagat 

30 gtattacatgaaatgaatataaaaacaaacaaaatgaacgaagtagagcaacaattaaat 
ttgaaaggattagcattcataagtgcaactgattcttatccaatgatgaaaaagcttcaa 
tcaattagaaaatcgattgcaacacctacaatttttaacttgattggaccattaattaat 
cctttcaaattaacttatcaagtgatgggggtatatgaagcttcacaacttgaaaatata 
gcacaaacattaaaggatttaggtagaaaacgagcaattttaattcatggtgcaaatggg 

35 atggatgaggccacgctttctggtgaaaatatcatttatgaagttagcagcgaaagagca 
ttaaaaaaatatagtttaaaagcagaagaagtcggtttagcttatgcaaataatgacacg 
ttgaV.aggtggttcacctcaaacaaataaacaaattgcattgaatatcctaagtggcacg 
gatcactcaagtaaacgagatgtagttttgttaaatgctggaattgctttatatgttgct 
gagcaagtggaaagtatcaaacatggcgtagagagagcgaaatatctcattgatacaggt 

40 atggcaatgaaacaatatttaaaaatgggaggttaa 

Sequence 224 4 

MTLLEKIKQNKSLSKKDMQSFIVTLFDSNIETNVPCVELLKAYTNKDMGQYELTYLVEYFI 
QTNYPNQPFYNKAMCVCGTGGDQSNSFNISTTVAFVVASAGVPVIKHGNKSITSHSGSTD 
45 VLHEMNIKTNKMNEVEQQLNLKGLAFISATDSYPMMKKLQSIRKSIATPTIFNLIGPLIN 
PFKLTYQVMGVYEASQLENIAQTLKDLGRKRAILIHGANGMDEATLSGENIIYEVSSERA 
LKKYSLKAEEVGLAYANNDTLIGGSPQTNKQIALNILSGTDHSSKRDVVLLNAGIALYVA 
EQVESIKHGVERAKYLI DTGMAMKQYLKMGG* 

50 Sequence 224 5 

Cont ig_07 30_pos_4 4 7_7 0 , 

putative peptide of unknown function 

atgactattttaaatgaaattattgagtataaaaaaactttgcttgagcgtaaatactat 
gataaaaaacttgaaattttacaagataacggaaatgttaagaggagaaagctgattgat 
55 tcacttaactatgatagaacattatcagttattgctgaaataaaatcgaaaagcccatct 
gtacctcaattaccgcaacgtgatcttgttcaacaagttaaagattatcaaaaatatggt 
gctaotgctatttcaatattaactgatgaaaaatactttggcggtagttttgaacfvatta 
aatcagttatcaaagataactgaacgtttaaccttttgtacctcatcagcttgctcttcg 
acgcccatattcttatga 
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Sequence 224 6 

MTILNEIIEYKKTLLERKYYDKKLEILQDNGNVKRRKLIDSLNYDRTLSVIAEIKSKSPS 
VPQLPQRDLVQQVKDYQKYGANAISILTDEKYFGGSFERLNQLSKITERLTFCTSSACSS 
5 TPIFL* 

Sequence 2247 
Con t ig_0 7 3 2_pos_8 7 1_1 17 0 , 
puuac.iv^j peptide of unknown function 
JO atggaccatattccacaagtagttatttttaataaaaaagacttatgtaacgaacagatg 
gatgtacctgtatctaaatctgcgcatgtttttgtatctagtcgtgatgaaaatgataaa 
caaaaggtgaaaaatttagtaattcaagaaataaaaaatagtctcagcccatacgaagaa 
attgtagatagtgctgatgcagatagattatattttcttaaacaacacacgcttgttact 
gaattaatatttgacgaaacacaagcatcttatcgtatcaaaggatttaaaaaattataa 

15 

Sequence 2248 

MDHIPQVVIFNKKDLCNEQMDVPVSKSAHVFVSSRDENDKQKVKNLVIQEIKNSLSPYEE 
IVDSADADRLYFLKQHTLVTELIFDETQASYRIKGFKKL* 

20 

Sequence 224 9 

Cont ig_07 3 2_pos_32 8 9_4 629, 

is similar to (with p-value O.Oe+00) 

>sp:sp|Q59812|GLNA_STAAU GLUTAMINE SYNTHETASE (EC 6.3.1.2) ( 

25 GLUTAMATE—AMMONIA LIGASE) (GS). >gp:gplX7 64 90|SAGLNAR_3 S.a 
ureus (bb270) glnA and glnR genes, NID: gll34885. 
atgccaaaacgtagttttacaaaagatgatattcgtaaatttgctgaagaagaaaacgta 
agatatttaagattacaattcactgatattttagggactattaaaaatgttgaagt ^:cca 
gtaa<^tcaattagaaaaagtattagataatgaaatgatgtttgatggttcatcta . tgaa 

30 ggtttcgttcgtatcgaagaatcagatatgtatttacatcctgatttagatacttgggtt 
atcttcccttggactgctggacaaggaaaagttgcacgactaatctgtgatgtatttaaa 
acagatggtacaccatttgaaggtgatccacgagctaacttgaagcgtgtattaagaaga 
atggaagatatgggctttactgattttaatctagggcctgaaccagaatttttcttattt 
aaattagacgaaaaaggcgaacctacattagaattaaacgatgatggtggttatttcgat 

35 ttagctcctacagatttaggtgaaaattgtcgccgtgacatcgttttagaattagaagat 
atgggctttgacattgaagcaagccaccatgaagtagcgccaggtcaacatgaaattgac 
tttaaatatgcagatgccgttacagcatgtgataatatccaaacatttaaactagttgtt 
aaaacaattgcacgtaagcataatttacatgcaacatttatgccaaaaccattatttggt 
gtaaacggtagtggtatgcacttcaacgtatcactatttaaaggaaaagagaatgcgttc 

40 tttgatcctgaaggtgatttacaattgactgatactgcatatcaatttacagctggtgtc 
cttaaaaacgctagaggattcactgcagtatgtaatccaattgtcaactcatataaacgt 
cttgtaccaggttacgaagcaccatgttatattgcatggagtggtaaaaaccgttcacct 
ttagtacgtgttccaacatctag.aggtctatcaactcgtattgaagtacgctcagttgac 
cctgcagctaacccgtacatggcattagcagcaatcttagaagcagggttagatggaatt 

45 gagaataaacttgaggttccagaacctgtaaaccaaaatatctacgaaatgaatcgtgaa 
gaacgagaagcggttggtatccaagacttaccttcaactttatacactgcgttaaaagca 
atgcgtgaaaataaatcaattaaaaacgcattaggtaatcatatttacaatcaatttatt 
aac:?:caacatcgattgaatgggattactatagaactcaagtatccgaatgggaaar;agaa 
cagtatectaagcaatactaa 

50 

Sequence 2250 

MPKRSFTKDDIRKFAEEENVRYLRLQFTDILGTIKNVEVPVSQLEKVLDNEMMFDGSSIE 
GFVRIEESDMYLHPDLDTWVIFPWTAGQGKVARLICDVFKTDGTPFEGDPRANLKRVLRR 
MEDMGFTDFNLGPEPEFFLFKLDEKGEPTLELNDDGGYFDLAPTDLGENCRRDIVLELED 
55 MGFDIEASHHEVAPGQHEIDFKYADAVTACDNIQTFKLVVKTIARKHNLHATFMPKPLFG 
VNGSGMHFNVSLFKGKENAFFDPEGDLQLTDTAYQFTAGVLKNARGFTAVCNPIVNSYKR 
LVPGYEAPCYIAWSGKNRSPLVRVPTSRGLSTRIEVRSVDPAANPYMALAAILEAGLDGI 
ENKLEVPEPVNQNIYEMNREEREAVGIQDLPSTLYTALKAMRENKSIKNALGNHIYNQFI 
NSKS I EW D Y YRTQVS EWEREQY I KQ Y * 
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Sequence 2251 

Contig_0732_pos_64 96_684 0, 
putative peptide of unknown function 
5 atgaataaagaccaaaagaatcaatatgagttggaaaagctgttaaaagaaaatgaagaa 
ctaaaagcagaaaaagctttatctcaaatgaagaatgagactcgttcaatgcttaatgag 
tcaggtttagaaaacttcgatgatcaaattgttaatatattagtaaatactgatgctgaa 
aaaacaaggaaaaatgttgaatcatttactaacttacttaatcaaatggtaaaatcaaat 
gttgaaaaagcattaagacaagactcaccagtaagcactcaatcaaataaaatgacaaaa 
10 gatgaagaatcagattatcttgtttcagggagaaaatgtctataa 

Sequence 2252 

MNKDQKNQYELEKLLKENEELKAEKALSQMKNETRSMLNESGLENFDDQIVNILVNTDAE 
KTRKNVES FTNLLNQMVKSNVEKALRQDS PVSTQSNKMTKDEES DYLVSGRKCL* 

15 

Sequence 2253 

Contig_0732_p9s_8090_8434, 

putative peptide of unknown function 

atgctaattattttaattttactagcattaagttttggtttaattcctatttctattaac 
20 ttijc jtpvattaggtttaattccctttattttatggggagctatgggatgggctacacaa 
gctcctcaacaacatatattattgaaaaaacatcctgaatatggaggctctgctgtcgct 
ttaaatagttctattaattatttaggcagtgctatgggatcagcaatcggaggaattatt 
ttatttaatgctaatagtacaaatgtactaatatatagtgctttaggaattactattatt 
ggtattttattacaattactaaatttatccctagaaaaaaattaa 

25 

Sequence 2254 

MLIILILLALSFGLIPISINLPILGLIPFILWGAMGWATQAPQQHILLKKHPEYGGSAVA 
LNSSINYliGSAMGSAIGGIILFNANSTNVLIYSALGITIIGILLQLLNLSLEKN* 

30 Sequence 2255 

Cont ig_07 32_pos_7 4 9 5_7 0 9 1 , 

putative peptide of unknown function 

atggaaaaatcaacgcagacagacaaacaaaactctgtgaacttaaagcaaaacacaaaa 
gatcaaaataataacgcaaatgatgaagcagcttctccaactagcgaacaaaatgcagct 
35 atagcacaagcaaagtcatatgcaaatacattacctatctctaagaaaagtttatacaaa 
caattaacttcggaatacggagagaaatatccggcagacatagcacagtatgctgttgac 
catatcagtgtagattataaaatgaatgcactgagattagcaaaaagttacgtaaaaaat 
ataaacatttctaatcaagcgttatatgatcaactcgtttcagaaaatggagaaggattt 
actcctgaagaagcacaatatgcaatgaatcatttagataggtaa 

40 

Sequence 2256 

MEKSTQTDKQNSVNLKQNTKDQNNNANDEAASPTSEQNAAIAQAKSYANTLPISKKSLYK 
QLTSEYGEKYPADIAQYAVDHISVDYKMNALRLAKSYVKNINISNQALYDQLVSENGEGF 
TPEEAQYAMNHLDR* 

45 

Sequence 2257 

Con t ig_0 7 3 3_pos_5 0 8 1_5 728, 

putative peptide of unknown function 

gtgtctacttcccaattgattgtttcgaattccggacgagctaactcagggtttttctct 
50 aattcagcaacagtgttgaatttagcgttagcacgttttaagatcgggtatttcccactt 
gcagttgatactgaagttttttgtaccaattctgataagtcttggactgtcttaacttct 
ttttcaggaatatatttaatatcctctgggatagttacgccaacgtcatcagatttaaca 
ttgtcacgtttagccccttttgatttcatgtactgttcaaatgctagaatttcttcgttt 
gtctctggattttggtttaatttagccatagaacgtttcgctccttcttttttgtctttt 
55 tctttttttaattcttcttctgttggttcttctactttttcaatagtaggtgtttctggt 
gtttcttcaggtttgtcatctggttttggtgcatcatcaggtttttcttcatctgaagtt 
ccttctggttcatcatcagaaggtttgttctctgattcttctccagaattaccatctttg 
ttatcttcaacttctgcaccttcatctttaggtggttcatcttgtttaggtgctgacgct 
tcc^atttcttttgaaagctgttcgagttcttcgtactctttcttttga 
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Sequence 2258 

VSTSQLIVSNSGRANSGFPSNSATVLNLALARFKIGYFPLAVDTEVFCTNSDKSWTVLTS 
FSGIYLISSGIVTPTSSDLTLSRLAPFDFMYCSNARISSFVSGFWFNLAIERFAPSFLSF 
5 SFFNSSSVGSSTFSIVGVSGVSSGLSSGFGASSGFSSSEVPSGSSSEGLFSDSSPELPSL 
LSSTSAPSSLGGSSCLGADASISFESCSSSSYSFF* 

Sequence 2259 

Contig_0733_pos_114 96_10366, 

10 putative peptide of unknown function 

atgaattcgttgcacatagcaggacgtattttcaaacagacgattcgagatgtaagaaca 
ttggc:actgttacttattgcacctatattactattgtcgctactatattacatttttaca 
gttgccgataatacgaatggcgtaacagttggggttcacgatgtaccagattcattaatg 
actgaattacatgataaagatattcacgttaaacattataaaaatgacaatgatataagt 

15 gataaaattaaagacgacaaattaacaggatttttgcacagtgatggtcaaaaagtatca 
gtgacttatgctaacgataatcctacacaagcaggagaactaacaggtgcaaatcaaaaa 
tggttaatgagtcataacatgaatgccatgaaagataatactaataaattgcatcaagcg 
ttaactaaaatacaacaaaaaatgcccggggatgggggagacacgcctcatcaagatatg 
gctaaaccatataaactaacaacgcactatttatatggttcatcagattctacgtatttt 

20 gatatgataaatcctattttaattggattttttgtctttttctttacgtttttaatttct 
ggcattggcttattaaaagagcgtacttctggcacattagaacgtttacttgcctctcca 
ataaaaagaagtgaaattatttttggttatgttttcggttatggtagttttagcgttatc 
caaacaatagttgtcgtattatatgcaatttatattctgcatatagacttagtaggttcg 
atatggttcgtactattaacggcaatattaacagcgcttgtcgctgtgacattcggtata 

25 ttattatctacctttgcttcctcagaattccaaatgattcaatttataccattagtcata 
gtgccacaagtactatttgcaggcattataccaattgaatcaatgaataaaggattacaa 
tacttttcacatatcatgccgttattctataccggccaaacgatgcaaaatattatgatc 
aagggttatggattcaacgatatttacatttatttaattgtgttattcgcatttttcatt 
ttcttattgattttaaatattataggcatgaaaagatatagaaaagtttag 

30 

Sequence 22.60 

MNSLHIA.GRIFKQTIRDVRTLALLLIAPILLLSLLYYIFTVADNTNGVTVGVHDV:-DSLM 
TELHDKDIHVKHYKNDNDISDKIKDDKLTGFLHSDGQKVSVTYANDNPTQAGELTGANQK 
WLMSHNMNAMKDNTNKLHQALTKIQQKMPGDGGDTPHQDMAKPYKLTTHYLYGSSDSTYF 
35 DMINPILIGFFVFFFTFLISGIGLLKERTSGTLERLLASPIKRSEIIFGYVFGYGSFSVI 
QTIVVVLYAIYILHIDLVGSIWFVLLTAILTALVAVTFGILLSTFASSEFQMIQFIPLVI 
VPQVLFAGIIPIESMNKGLQYFSHIMPLFYTGQTMQNIMIKGYGFNDIYIYLIVLFAFFI 
FLLILNI IGMKRYRKV* 

40 Sequence 2261 

Contig_0733_pos_10366_9706, 
putative peptide of unknown function 

atgaaccaagatattaagtcattagttgaaaccattgtgcctcaacttgaatatttaagc 
gataaacaaagacgtgtcatagaaagtgctattgcattattcagtgaacaaggatttgat 

45 aaaacgagtactaaagaaattgcgcagcgtgcaaatgtcgcagaaggaacggtatttaag 
cagtttaaaagtaaaagaatgttattatacgcaggattaattccaattttaagagatcat 
atcgcacctgtagctgttaaacaatttacagatgaattaaacgaagtaacccattttgat 
gcatttataaatttatttgtagaaaatagatctaaatttatttatgacaatagacgtatt 
cttaaagtcatcttaaatgaagctattactaatgaagattttcaaaatatattagttaat 

50 attttcacccataaattaacgagtaaattaaaagataaaattgaatggtttatcgataat 
ggt ::avgcgcaatgttaaacctgagttttttatacgtacggtcgtcgcacaaai.;:tta 
aatttaaatatcccaataatagttaataatgactatactaagggtgaaaactatC':igcag 
tttgcgttattcgtaaaagagggcttatataggatgtttaagcgagaatag 

55 Sequence 2262 

MNQDIKSLVETIVPQLEYLSDKQRRVIESAIALFSEQGFDKTSTKEIAQRANVAEGTVFK 
QFKSKRMLLYAGLIPILRDHIAPVAVKQFTDELNEVTHFDAFINLFVENRSKFIYDNRRI 
LKVILNEAITNEDFQNILVNIFTHKLTSKLKDKIEWFIDNGDMRNVKPEFFIRTVVAQIL 
NLNIPIIVNNDYTKGENYQQFALFVKEGLYRMFKRE* 
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Sequence 2263 
Coi.tig_0733_pos_4 4 89_3998, 
putative peptide of unknown function 
5 atgccactttttgtgacagattatttaaatgaaaaggaaagatatgtaggctatttagca 
agtttatgtgcaggtttagaagtgccgtttatggttatattaggtatattatcagctaaa 
ttgccaactcgaactttattgataattggttctgtatttggtggcgcattttactttagt 
attggtgtatttaaaaattttcatatgatgcttgttggacaaatatgtttagccatcttt 
ctagcaatattactaggtcttgggattagttattttcaagatattcttcctgattttcca 
10 ggttatgcatcaacactttttgccaatgccatggtaataggacaacttggtggtaatttg 
ctaggtggtgtgatgagtcattgggttggtttggaaaatgtattctttgtctcagcaagt 
tctatcttcgtagggatgatactcatcttgtttacgaaagatcaaaaaattacaatagaa* 
gatgtggagtag 

15 Sequence 2264 

MPLFVTDYLNEKERYVGYLASLCAGLEVPFMVILGTLSAKLPTRTLLIIGSVFGGAFYFS 
IGVFKNFHMMLVGQICLAIFLAILLGLGISYFQDILPDFPGYASTLFANAMVIGQLGGNL 
LGGVMSHWVGLENVFFVSASSIFVGMILILFTKDQKITIEDVE* 

20 Sequence 2265 

Contig_0733_pos_3991_3503, 

putative peptide of unknown function 

atgacagcgatattatggattttaattattatagcctttgcgttagcatttattggttta 
attaaaccggtgataccatcacttttaatgttatggattggttttttaatttatcaattt 

25 ggttttcatgagggaagattatcgtggattttttatgttgcaatgattatctttaccatt 
atgatattagtagccgattttgtgatgaataaatatttcgtcaatcgctttggaggaagt 
aaaataggtgaatacacagcgctcataggtgtgattgtaggttgtttcgtttttcctccc 
tttggtattatcattattccttttgttgctgtgttcattgttgaattggttcaagggttt 
aactttcaacaagctataaaggtgagttttggctcagtgattgcatttttagcgagtaca 

30 attgctcaaggtctaataatgattgtaatggttatttggttctttttagatgtctttcta 
ataaattaa 

Sequence 2266 

MTAILWILIIIAFALAFIGLIKPVIPSLLMLWIGFLIYQFGFHEGRLSWIFYVAMIIFTI 
35 MILVADFVMNKYFVNRFGGSKIGEYTALIGVIVGCFVFPPFGIIIIPFVAVFIVELVQGF 
NFQQAIKVSFGSVIAFLASTIAQGLIMIVMVIWFFLDVFLIN* 

Sequence 2267 
Contig_0733_pos_2319_1897, 

40 putative peptide of unknown function 

atgagaaaatggttaaccttactattaattacaacattggtgttaactgcatgtggtaaa 
agtaacgaaaaagcttctttagaaaaaagcattgatcagttgaaaaaagaaaataaggat 
ttaaaaaaacagaagaaaaagttacaagagcaaaaggataagcttaaacacaaaC7»ggat 
agtctccaagaagatgtaaatgacttgcctgctaaaagcacatcccgagataagaaaaat 

45 aaagataatcatgatgcaaaagaaaagtcttcagataatcaatcgacatctgctaatcat 
gatgatcaaactaacaaaataaaaagcaatcaagatgaacatgacagtcaatcctctaaa 
ccacatacacagcagaagccctcacagaatgatagaaaaaataatcatcgacaagaacga 
tag 

50 Sequence 2268 

MRKWLTLLLITTLVLTACGKSNEKASLEKSIDQLKKENKDLKKQKKKLQEQKDKLKHKQD 
SLQEDVNDLPAKSTSRDKKNKDNHDAKEKSSDNQSTSANHDDQTNKIKSNQDEHDSQSSK 
PHTQQKPSQNDRKNNHRQER* 

55 Sequence 2269 

Contig_0733_pos_7 65_24 4 , 

putative peptide of unknown function 

atgtcaaaaatcttaaacacacaattaactggtatttttaatcggcttgaaaaacaagag 
ttggatattcaaatggcagctcaatgtctcattcaagcaattggtggagaaggacatgtc 
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tatatcaaaggctacgatgatttaaaattctatgagtcattcatattacaaagccatgaa 
aaattagcgtctagcttaccacttgaagatttacaaaattttaacgatatagatacaaca 
gatagggtactgttatttt caeca tactacacttcggaagttgaaagtgatgtacttcaa 

cttattgatttagatgtcgatttagtgcttatttgtaataaccctaaacgagatgatttt 
5 cctaatcatttaattcattatgttaatttatcaacacctaggcccattgtttacacagaa 
gattatgataaaatcattcaaccacatccgatggccttaaattatatttattatgatatt 
tatactcaaatgattgagatgactagagacctagatttatag 

Sequence 2270 

10 MSKILNTQLTGIFNRLEKQELDIQMAAQCLIQAIGGEGHVYIKGYDDLKFYESFILQSHE 
KLASSLPLEDLQNFNDIDTTDRVLLFSPYYTSEVESDVLQLIDLDVDLVLICNNPKRDDF 
PNHLIHYVNLSTPRPIVYTEDYDKIIQPHPMALNYIYYDIYTQMIEMTRDLDL* 

Sequence 2271 
15 Contig_0734_pos_2644_3024, 

putative peptide of unknown function 

atgcaccagcttgtccatactgacgataatgcaataaatgtagatgtattaccaccacaa 
caagctgacggtaaagcgactaatccagaacaattatttgctgcaggttacgcttcatgc 
tttaatggtgcatttgatttaattttaaaacaaaataaagtgcgcgatgctgaacctgaa 
20 gtaacgttaacggtacgcttggaagacgatccagatgccgaaagcccaaaacttagcgtt 
gatattcatgcaaaagttaaaaatgttttatcacaagaagatgctgaaaaatatttacaa 
gatgcgcacgacttttgtccgtattcaaaagctacacgtggcaatatcgatgtaaactta 
aatgttgaagtagtagaataa 

25 Sequence 2272 

MHQLVHTDDNAINVDVLPPQQADGKATNPEQLFAAGYASCFNGAFDLILKQNKVRDAEPE 
VTLTVRLEDDPDAESPKLSVDIHAKVKNVLSQEDAEKYLQDAHDFCPYSKATRGNIDVNL 
NVEWE* 

30 Sequence 2273 

Contig_0734_pos_3557_4 369, 

is similar to (with p-value l.Oe-96) 

>gp:gpj Y17554 I BLY17554_1 Bacillus lichenif ormis arcA, arcB, 
arcC and arcD genes. NID: g3687415. 

35 atggatgataaatatccgttttatcttgacccaatgcctaatctatat ttcacacgtgat 
cctcaagcttcaattggtagaggtatgacagtaaatcgtatgttttggagagcgagacgc 
agagaatcgattttcatttcatatattttaaaacatcatcctagatttaaagatgagaat 
attcctttatgggtggatcgtgactgtccgttcaacatcgaaggtggagacgaactggtg 
ttatctaaagatgtacttgcaatagggatatetgaacgtacttctgcacaagcaattgaa 

40 cgtttagcacgacgtatttttaaagatccgttatctacttttaaaaaggtggtggcgatt 
gagattccaactagtcgaacatttatgcacttagatactgtttgtacaatgattgattac 
gacaaattcactacacattcagcaattcttaaatcagaaggaaacatgaatatctttatt 
atcgaatatgatgataaagctgaagatatcaaaatccaacatt ctagtcatcttaaacaa 
acattagaagaagtgctcgatgttgatgaaatcacattaataccaactggaaatggtgat 

45 atcatcgacggtgctcgtgaacaatggaatgatggttcgaatactttatgcatacgtccc 
ggtgtggttgtaacttatgatcgtaattatgtttctaatcaattgttacgtgagcatggt 
atcaaagttattgaaattcctggaagtgaacttgtacgtggtcgaggaggccctcgatgt 
atgagtcaacctttaataagagaagatctatag 

50 Sequence 2274 

MDDKYPFYLDPMPNLYFTRDPQASIGRGMTVNRMFWRARRRESIFISYILKHHPRFKDEN 
IPLWVDRDCPFNIEGGDELVLSKDVLAIGISERTSAQAIERLARRIFKDPLSTFKKVVAI 
EIPTSRTFMHLDTVCTMIDYDKFTTHSAILKSEGNMNIFIIEYDDKAEDIKIQHSSHLKQ 
TLEEVLDVDEITLIPTGNGDIIDGAREQWNDGSNTLCIRPGVVVTYDRNYVSNQLLREHG 

55 IKVIEIPGSELVRGRGGPRCMSQPLIREDL* 

Sequence 2275 

Contig_0734_pos_4 4 56_5061 , 

is similar to (with p-value 2.0e-70) 
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>gp:gp| AJ0O1330 |LSAJ1330_2 Lactobacillus sake DNA encoding t 
he arginine-deiminase pathway genes. NID: g2764610. 
atgaaaggggcgcatcccatgaaaaacatcaagaaaccctttgatttaaaaggtaagtca 
ttgctaaaagagtatgatcttacaggtgaagaatttgaaggtctaatcgattttgctatg 
5 acattaaaaaaatataaacaacaaggcacaccacatcgatatttagagggtaagaatatt 
gctt tactcttcgaaaagacatctactcggacgcgtgccgcatttacagttgcatctatt 
gatctaggtgcacaccctgaatttttagggaaaaatgatattcaattaggaaaaaaagaa 
tctgttgaggatactgctaaagttttaggcagaatgtttgatggaattgaatttagaggt 
ttttcccaaaaaactgttgaacaattggccgaattctctggagtaccagtatggaatggg 
10 ttaactgatgattggcatcctacacaaatgttagctgattatatgacaattaaagaaaat 
tttggatatttaaaaggcatcaacctaacttatgtaggaaacggacgtaataatgttgca 
cattcgcttatggtggcggtgctttcaatattcaatctgatatgggaggccacgctggag 
gattag 

15 Sequence 2276 

MKGAHPMKNIKKPFDLKGKSLLKEYDLTGEEFEGLIDFAMTLKKYKQQGTPHRYLEGKNI 
ALLFEKTSTRTRAAFTVASIDLGAHPEFLGKNDIQLGKKESVEDTAKVLGRMFDGIEFRG 
FSQKTVEQLAEFSGVPVWNGLTDDWHPTQMLADYMTIKENFGYLKGINLTYVGNGRNNVA 
HSLMVAVLSIFNLIWEATLED* 

20 

Sequence 2277 

Contig_07 34_pos_5331_6341, 

is similar to (with p-vaiue 4.0e-96) 

>gp:gp| Y17554 I BLY17554_3 Bacillus licheniformis arcA, arcB, 

25 arcC and arcD genes. NID: g3687415. 

gtggcatcaatattattgtggagtgtgcactttttaatattaaaaggagttgaga..;.ggct 
gctttaatcaatagtattgtcactattacaaaattaattcctatcttattagtgattata 
tgtatgattgtagcctttaattttaatacattcaagataggtttctttggaatggatgga 
tatggatcattatcatttcattt cgctaatacgatgtcacaagttaaaagtacgatgtta 

30 gtgacagtttgggtatttattggtattgaaggtgctgttgtattctccggaagagctaaa 
aataaaaaagatgttggaactgccactgttatcggacttatttcagtattgctcatttac 
ttcttgctgactgtattagcacaaggtattgtaattcaaaatcatatttctaaacttgag 
gcaccatcaatggcacaaattttagcttatattgttggtgattggggagcaacatttgtc 
aatattggtcttattatttcagtattaggtgcatggctaggttggacattacttgccgga 

35 gagttgccttttattgtagctaaagatggtttgtttcctaagtggtttgcaaaagaaaat 
aaaaatggggcacctatgaacgccttattcattactaatgtgttagttcaaatattcctt 
attagtatgctgtttaccaaaagtgcttatcattttgcattttctctcgcggcaagtgct 
atactttatccatacatgttcagtgcgttttatcaggtgaaatatacaatagaacacaag 
ttaactgcaacgcctaaacaatggattataggaattctagcatctatttatgcaatttgg 

40 ctggtatacgcgtcaggtatagattacttgttacttaccatgttgctctatattccgggg 
attatcgtctatgttgttgttcagaaaaataatcaaaagcgacttacacaatttgactat 
attttcttcagtcttatcgttattttagcattgatagggttattacgatga 

Sequence 2278 

45 VASILLWSVHFLILKGVETAALINSIVTITKLIPILLVIICMIVAFNFNTFKIGFFGMDG 
YGSLSFHFANTMSQVKSTMLVTVWVFIGIEGAVVFSGRAKNKKDVGTATVIGLISVLLIY 
FLLTVLAQGIVIQNHISKLEAPSMAQILAYIVGDWGATFVNIGLIISVLGAWLGWTLLAG 
ELPFIVAKDGLFPKWFAKENKNGAPMNALFITNVLVQIFLISMLFTKSAYHFAFSLAASA 
ILYPYMFSAFYQVKYTiEHiaTATPKQWIIGILASIYAIWLVYASGIDYLLLTMLLYIPG 

50 IIVYVVVQKNNQKRLTQFDYIFFSLIVILALIGLLR* 

Sequence 2279 

Contig_0734_pos_2115_1636, 
putative peptide of unknown function 
55 gtgccagatcatatagagaaagtagtggtcgtagtaaatcctcaaatgtccaccataaag 
agaataattaatcaaactgatattaacacaatccaattacatggaaatgaaagcattcaa 
ttaattagaaatattaagaaact taattcaaaaataagaatcataaaagcaattccagca 
acaagaaatttaaataataacattcaaaagtataaagatgagatagacatgtttattata 
gatacaccatcaatcacatacggagggacaggtcaaagttttgactggaaattattaaaa 
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aaaataaagggcgttgattttctcattgcgggtggtttggattttgaaaagataaaacga 
ttagaaatatattcatttggacaatgtggttatgacatctcaactggcattgagtcacat 
aatgaaaaagattttaataagatgactcgaatattaaaatttttgaaaggagacgaatga 

5 

Sequence 2280 

VPDHIEKWWVNPQMSTIKRIINQTDINTIQLHGNESIQLIRNIKKLNSKIRIIKAIPA 
TRNLNNNIQKYKDEIDMFIIDTPSITYGGTGQSFDWKLLKKIKGVDFLIAGGLDFEKIKR 
LEIYSFGQCGYDISTGIESHNEKDFNKMTRILKFLKGDE* 

10 

Sequence 2281 

Contig_0734_pos_1632_424, 

is similar to (with p-value O.Oe+00) 

>sp:sp|Q01998 |TRPB_LACLA TRYPTOPHAN SYNTHASE BETA CHAIN (EC 
15 4.2.1.20). >pir:pir IS35129jS35129 tryptophan synthase (EC 4. 
2.1.20) beta chain - Lactococcus iactis subsp. lactis >gp:gp 
|MS71S3:lACTRPOP_7 L. lactis trpE, trpG, trpD, trpF, trpj, t 
rpB trpA. genes, complete cds. NID: gl49614. 

atgaaaattcaaacagaagtagatgaattgggctttttcggtgaatatggtggccaatat 

20 gtacctgaaacattgatgccagctattattgaacttaaaaaagcatatgaggacgcgaaa 
tcagatactcacttcaagaaagaatttaattattatttaagtgaatatgttggtagagaa 
acgcctttaacatttgctgaatcatacacaaaattgttaggtggtgccaaaatatatctt 
aaaagagaagacttaaatcacactggtgctcataaaattaataacgcgataggacaggca 
ctattagctaaaaggatggggaaaactaaattagtagccgaaacaggtgctggtcaacat 

25 ggtgtagcaagtgccaccatcgctgctttattcgatatggatcttattgttttcatggga 
agtgaagatatcaaacgtcaacaacttaacgtatttagaatggaattgctaggagctaaa 
gtagtgtctgtgtcagatgggcaaggaacactatcagatgctgtaaataaagctttgcaa 
tattgggtgaatcatgtcgaggatacacattatttattaggctcagcgttgggacctgat 
ccgtttccaactatggtcagagattttcagagtgtgattggtaatgaaattaaaagccaa 

30 attttaagtaaagaaggacgacttccagatgcgttagtcgcgtgtgttggtggaggatcc 
aattcaataggtacgttctatccatttatacaagatgatgttaaattatatggggtagaa 
gctgcgggaaaaggaagtcatacgcataatcatgctttagctatagggaaaggtaaacca 
ggtgtattacatggttccaaaatgtaccttattcaaaatgatgatggacaaattgaattg 
gcacactctatatcagcgggactagattatccaggtattggacctgaacattcgtattat 

35 aatgatattggtcgtgtatcatatgtaagtgctacagataatgaagctatggaagcactt 
at aacattctcaaaagttgaaggtatcattccagcaattgaaagtgcacatgcattgagt 
tatgttoriaaaattagcgccaaatatggatgaaaaagaaattattgttgtgactat ttca 
ggtcgtggagataaagatatggaaacaattaaacaatacaaagaaaacggtggtgaacaa 
aatgagtaa 

40 

Sequence 2282 

MKIQTEVDELGFFGEYGGQYVPETLMPAIIELKKAYEDAKSDTHFKKEFNYYLSEYVGRE 

tpltfaesytkllggakiylkredlnhtgahkinnaigqallakrmgktklvaetgagqh 
gvasatiaalfdmdlivfmgsedikrqqlnvfrmellgakvvsvsdgqgtlsdavnkalq 
45 ywvnhvedthyllgsalgpdpfptmvrdfqsvigneiksqilskegrlpdalvacvgggs 
nsigtfypfiqddvklygveaagkgshthnhalaigkgkpgvlhgskmyliqnddgqiel 
ahsisagldypgigpehsyyndigrvsyvsatdneamealitfskvegiipaiesahals 

Y VEKLAPNMDEKE 1 1 WT I SGRG DKDMET I KQYKENGGEQNE * 

50 Sequence 2283 

Contig_0734_pos_0_404 , 

is similar to (with p-value 8.0e-20) 

>sp:sp|P17166|TRPA_LACCA TRYPTOPHAN SYNTHASE ALPHA CHAIN (EC 
4.2.1.20). >pir:pir IS42347I JS0344 tryptophan synthase (EC 4 
55 .2.1.20) alpha chain - Lactobacillus casei >gp: gpl D00496 | LEA 

TRP 6 Lactobacillus casei DNA, trp operon (trpD, trpC, trpF, 
trp£, 'crpA) , complete cds. NID: g216754 . 

atgggtgatttaaattttattcatcatttaaaaacattaactgagaatggagcagacatt 

gttgaaattggtgtgccattttctgatcctgttgcagatggacctataatcatgaaagca 
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gggcgcaacgctattgacgagggttcaaacattaaattcatttttgatgaattaataaaa 
aataaaaatactatttcatctaagtatgtattaatgacttattataatattctaagtgct 
tatggagaagaattatttttggataagtgtgatgaagctggtgtttatggtttaattatt 
ccagatttaccttacgaacttacaaaaaagtttaaaaaagatttttatcatcattctgtt 
5 aaaataatatcgttaattgccatgaccgcaagtgatgctaggat 

Sequence 2284 

MGDLNFIHHLKTLTENGADIVEIGVPFSDPVADGPIIMKAGRNAIDEGSNIKFIFDELIK 
NKNTTS6:KyVLMTyYNILSAYGEELFLDKCDEAGVYGLIIPDLPYELTKKFKKDF;HHSV 
10 KIISLIAMTASDARX 

Sequence 2285 
Contig_0735_pos_1041_2267, 
is similar to (with p-value 3.0e-89) 
15 >gp:gp| AF024571 |AF024571_1 Staphylococcus aureus high affini 
ty proline permease (putP) gene, complete cds . NID: g2565310 

atgagtttagtatatgacaaacttactcaagatcaaccgtatcattcgtggtttaatatt 
gttgagcatttcttaccttctgatagtcatgatttgttagatattggttgcggtactggc 

20 aacttaacacaattactaacgtcactaggtgaagtcactggtatggatattagtgtagat 
atgttatcaatagctagacaaaaaacaaatcaagtgaagtggatcgaaggtaatatgact 
cactttaatttgaacaaaaaatttaatatgattacaatattttgtgattcactgaattat 
ttagaaacattaaatgacgtaaaaatgacattcgaaagagtgtatcaacatttaaataaa 
aatggtgtttttatttttgatgtacatactgttcataaaatgaaaacattatttaataat 

25 aaaagttatattgatgaatctgataatgtttttgtaggttgggatgcaatatgtggggat 
gaaccattagtaagatttatgtctatcaaatcacataaattactaccaaaagccagacgt 
ctgggaataagttggatggcagtcggtctattaggagcgattggtgtaggattaacagga 
atttcatttatatctgaaagacatattaaatcagaagatcctgaaacactatttat.tgtg 
atgagtcaaatattatttcatccgcttgtaggtggatttttattagcagccatcc- tgct 

30 gcaataatgagtactatctcttcacaattactagtaacatcaagttctttaactgaagat 
ttctataaactaatcagaggttcagataaagcatcatcacaccaaaaagagtttgttttg 
attggacgcttatcagttctacttgttgcgatagttgctattacgattgcttggcatcca 
aacgatacaatactaaatt'tagttggtaatgcttgggctggttttggagctgcatttagt 
cctttagtactctactctttatattggaaagatttaacacgtgcaggagctattagcgga 

35 atggtagctggtgctgtggttgttattgtttggatttcttggataaaacccttggctaca 
atcaatgcattctttggtatgtatgaaatcattccaggtttcataattagcgtattgatt 
acctacatcgtaagtaaattaacaaaaaaacctgatgattatgttattgaaaatcttaat 
aaagttaaacacatcgttaaagaataa 

40 Sequence 2286 

MSLVYDKLTQDQPYHSWFNIVEHFLPSDSHDLLDIGCGTGNLTQLLTSLGEVTGMDISVD 
MLS I ARQKTNQVKWl EGNMTHFNLNKKFNMI TI FCDSLN YLETLNDVKMTFERVYQHLNK 
NGVFIFDVHTVHKMKTLFNNKSYIDESDNVFVGWDAICGDEPLVRFMSIKSHKLLPKARR 
LGISWMAVGLLGAIGVGLTGISFISERHIKSEDPETLFIVMSQILFHPLVGGFLLTy^ILA 

45 AIMSTISSQLLVTSSSLTEDFYKLIRGSDKASSHQKEFVLIGRLSVLLVAIVAITIAWHP 
NDT I LNLVGNAWAGFGAAFSPLVLYSLYWKDLTRAGAISGMVAGAWVI VWI SWI KPLAT 
INAFFGMYEIIPGFIISVLITYIVSKLTKKPDDYVIENLNECVKHIVKE* 

Sequence 2287 
50 Contig_0735_pos_4 94 5_3824, 

is similar to (with p-value O.Oe+00) 

>gp:gp| AJ011676|BST011676_1 Bacillus stearothermophilus lig 
gene. NID: g3688228. 

atgggttatacgcaaaaatctccaagatgggcgattgcttataaatttccagctgaagaa 
55 gttattacaaaattattggatattgagctaagtattgggcgtacgggtgttgtgacacca 
actgcaattctagaacctgtaaaagtagctggtactacagtttcaagagcctcacttcat 
aatgaagatttaatacatgaaagagatatacgtatcggagatagtgttgttattaaaaaa 
gccggggacatcatccctgaagttgtaaaaagtattttagatagacgacctaacgaatcg 
gaaatttatcatatgccaacacattgtcctagttgtggacatgaattagttcgtattgaa 
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ggagaagttgctttacgttgtattaatccaaaatgtcaggcacagcttattgaaggactt 
atacatttcgtttcaagacaagcgatgaatatagatggtttaggtactaaaattattcat 
cagctatacgaaaatcagttaatcaaagatgtcgcagatattttctatttgaaagaagaa 
gatttattaccattagagcgaatgggaaagaagaaagttgataatcttttattagcgata 
5 gaaaaatctaaagaacagtcattagagcatttattatttggacttggtattagacattta 
ggtgtaaaagctagtcaagtacttgctgagcgatatgaaacgatggatcaactttttaaa 
gtaactgaaagtgaattaattgaaattcaagatattggagataaacttgcacaatctgtt 
gtaacatatctcgaaaatagtgatattcgttcattaattgaaaaattaagtaataaaaat 
gttaatatgtcttataaaggaattaaaacaactgaaatcgaaggtcatcctgattttagt 
10 gggaaaacaattgtattaacagggaaactcgagcaaatgacgagaaatgaagcatctgaa 
tggttgaaaatgcaaggtgctaaagttacaagcagcgtgactaaaagtactgatattgtc 
atagctggagcagatgcagggtctaaattagccaaagctgagaagtatggtactgaaatt 
tggactgaagcagcatttattgaaaaacaaaatggaatctaa 

15 Sequence 2288 

MGYTQKSPRWAIAYKFPAEEVITKLLDIELSIGRTGVVTPTAILEPVKVAGTTVSRASLH 
NEDLIHERDIRIGDSVVIKE<AGDIIPEVVKSILDRRPNESEIYHMPTHCPSCGHELVRIE 
GEVALRCINPKCQAQLIEGLIHFVSRQAMNIDGLGTKIIHQLYENQLIKDVADIFYLKEE 
DLLPLERMGKKKVDNLLLAIEKSKEQSLEHLLFGLGIRHLGVKASQVLAERYETMDQLFK 

20 VTC3H:LT>:rQDIGDKLAQSWTYLENSDIRSLIEKLSNKNVNMSyKGIKTTEIEGHPDFS 
GKTIVLTGKLEQMTRNEASEWLKMQGAKVTSSVTKSTDIVIAGADAGSKLAKAEKYGTEI 
WTEAAFIEKQNGI* 

Sequence 2289 
25 Contig_0735_pos_3414_2596, 

is similar to (with p-value l.Oe-49) 

>gp:gp| Z99107 |BSUB0004_111 Bacillus subtilis complete genome 
(section 4 of 21): from 600701 to 813890. NID: g2632866. >g 
p:gp} Y15254 |BSYERABCD_7 Bacillus subtilis 13kB DNA fragment, 

30 from yerA to sapB gene. NID: g2577959. 

atgagcgaaaaagaaaagaaaagcaaaaatgctaatgagaatcttggactcaatccatct 
cacaatggtgaaacagatgaagagaaaatagctaaaaattctccagcctatctttcaaat 
atactcgagcaggatttttatggaaatagtgattctaaaggtaaaaatataaaagggatg 
acaattggtttagctatgaatagtgtttattattacaaaaaagagaaagatggcgaaaca 

35 tttagtaaagatttatctgataaagagattgaaaagcaaggtaaacagatggctagtgaa 
atgctttctcgtttacgtgagaatagtgatttgaaagatattcctattcattttgctatc 
tataaacaatcaagtcaagattccattacaccaggtgaatttatagttggtactacggtt 
gaagagggtaaaactaaaattaactcatgggataatat taatgaaaaagcagccttaatt 
cct tcgtcaactgcagctgattatgatgaaacgttgaataataactttaaacagtt*;aat 

40 gataatttgcaatcgtatttttcaaacttcacacaagcagttggtaaggttaaat'Lcgta 
aataaaaaagctaaacaacttacagttgatttgcctatagattattacggacaggcagaa 
acgataggtattacacaatatgttacagagcaagccgaaaaatattttgataaactagat 
gagtatgaaattagaatcaaagatggaaatactccacgtgctctcattagtaaaactaaa 
gacgataaagaaccacaagttcatatctatcataattag 

45 

Sequence 2290 

MSEKEKKSKNANENLGLNPSHNGETDEEKIAKNSPAYLSNILEQDFYGNSDSKGKNIKGM 
TIGloAMNSVYYYKKEKDGETFSKDLSDKEIEKQGKOMASEMLSRLRENSDLKDIPIHFAI 
YKQSSQDSITPGEFIVGTTVEEGKTKINSWDNINEKAALIPSSTAADYDETLNNNFKQFN 
50 DNLQSYFSNFTQAVGKVKFVNKKAKQLTVDLPIDYYGQAETIGITQYVTEQAEKYFDKLD 
EYEIRIKDGNTPRALISKTKDDKEPQVHIYHN* 

Sequence 2291 
Contig_07 35_pos_858_538, 
55 putative peptide of unknown function 

atgtttatgtgccacttctttaacagatctagcaatggattgaacttgacgttcattgtt 
cccgtgacagactacaaaataaaggggataaaacctcttgggaatgatataagaatatta 
ggtgcatcaagtga teat ttgatgattgatttaaataaccaagatcat tat caaatcggt 
gal.acacttcaatttagcttgaattatgaagcactgtctcagagcatgtatatgaa-aaat 
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ttaactaagttatatagtagtgattcaaaaatagaatcccttgttcagaacttcgatatg 
cctatatattcccagtgctaa 

Sequence 2292 

5 MFMCHFFNRSSNGLNLTFIVPVTDYKIKGIKPLGNDIRILGASSDHLMIDLNNQDHYQIG 
DKLQFSLNYEALSQSMYMKNLTKLYSSDSKIESLVQNFDMPIYSQC* 

Sequence 2293 

Con tig_07 3 6_pos_4 8 8 4_0 , 

10 is similar to (with p-value 3.0e-23) 

>gp:qp|U93876|BSU93876_19 Bacillus subtilis aminoglycoside 6 
-adenylyltransferase (aadK) gene, partial cds, and YrdA (yrd 
A), YrdB (yrdB), hypothetical protein YrdC (yrdC) , YrdD (yrd 
D), hypothetical cytochrome P450 protein YrdE (yrdE) , ribonu 

15 clease inhibitor (yrdF) , regulatory protein YrdG (yrdG) , hyp 
othetical protein YrdH (yrdH), hypothetical protein YrdI (yr 
dl), amino acid transporter (yrdJ), YrdK (yrdK) , LysR family 
regulatory protein YrdL (yrdL) , YrdN (yrdN), cation transpo 
rt protein YrdO (yrdO) , hypothetical protein YrdP (yrdP), Ly 

20 sR family transcription regulator YrdQ (yrdQ) , hypothetical 

protein YrdR (yrdR) and hypothetical protein YrkA (yrkA) gen. 
es, complete cds . NID: gl934641. >gp : gp i Z99117 | BSUB0014_14 0 
Bacillus subtilis complete genome (section 14 of 21) : from 2 
599451 to 2812870. NID: g2634966. 

25 gtgacaatcttagcgattgatattggagtgaatgtgggaatagcatcagcaattgtaaca 
attgtgattatacttatttctgaagtgattcctaaatcaattgctgcaacatttcctgat 
aaaatttcaaaacttgtgtatcctatcattcatatatgtgttattgtactcaagcccatt 
acaatcttattaaacaagatgacagatggtattaatcatttactatctcgaggccaacct 
gttgaaaaaagattttctaaagaagaaattcgtacattattaaatattgcgggtagagaa 

30 ggtgcatttaatgagatagaaaatactcgacttcaaaacgttatggact ttgaacaattg 
aaggttac».ggatgttgataccacgcctcgtattaatgttgtagctttttcaaaggc-!agta 
acatatgacgaagcttatgatacagtgatgaataacccatatacaagatatccagtatat 
gatgaaaatatagatgatatcatcggcgtattccactcaaaatatttattagcttggagt 
aaaaataaagaggacgcaattactaattatgcatcaagccctttatttgtaaatgaacat 

35 aatagggcagaatgggtattgcgtaaaatgaccgtttcacgaaaacatttagcgattgtt 
ttagatgaatttggaggtacggatgctatcgtatcgcacgaagatttaatagaagagcta 
cttggtatggatattgaggatgaaa 

Sequence 2294 

40 VTILAIDIGVNVGIASArVTIVIILISEVIPKSIAATFPDKISKLVYPIIHICVIVLKPI 
TILLNKMTDGINHLLSRGQPVEKRFSKEEIRTLLNIAGREGAFNEIENTRLQNVMDFEQL 
KVKDVDTTPRINVVAFSKEVTYDEAYDTVMNNPYTRYPVYDENIDDIIGVFHSKYLLAWS 
KNKEDAITNYASSPLFVNEHNRAEWVLRKMTVSRKHLAIVLDEFGGTDAIVSHEDLIEEL 
LGMDIEDEX 

45 

Sequence 2295 

Cont ig_0 7 3 6_pos_3 5 5 6_2 2 8 2 , 

is similar to (with p-value O.Oe+00) 

>sp:sp|P37 94 9|LEPA_BACSU GTP-BINDING PROTEIN LEPA. >gp:gp!D8 
50 4 4 32|BACaH64 2_108 Bacillus subtilis DNA, 283 Kb region conta 
ini.lv; r.j'.in element. NID: g2627063. >gp : gp I X91655 I BSLEPOI-' c_l 
B. subtilis lepA and hemN genes. NID: gll22397. >gp : gp 1 Z99117 
|BSUB0014_31 Bacillus subtilis complete genome (section 14 o 
f 21): from 2599451 to 2812870. NID: g2634966. 
55 atgtcaaagatatattgcattaaagtaataaaaacgttactatatcaaagatacatagaa 
gtgacaggttataaagatgaaagcgagaaggataaaatggataagcaagaacgatacaat 
agaagagaaaatattagaaatttctccattattgctcatatagaccatggtaaatcgaca 
ttagctgatcgaattttagagaatacaaaatcagttgaaactcgagaaatgcaagatcaa 
ttacttgactctatggatttggaaagagaacgaggcatcactattaaactaaatgctgtt 
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cgattaaaatacgaagctaaagatggagaaacttacacatttcatttgatagatacacca 
ggacatgtcgactttacatatgaggtttctcgctcattagctgcatgtgaaggtgcaatt 
cttgtagttgatgctgcccaaggtatagaagcacaaaccttagcaaacgtttatt - agca 
ttagataacgatttggaacttttgccagttgttaataaaatagacttgcctgcagctgag 
5 cccgatagagttaagcaagaattagaagatgttataggtatagatcaagaagatgtagta 
cttgcaagtgctaagtcaaatataggtattgaagaaattttagagaaaatagttgatgtt 
gtaccagcaccggacggtgatccagaagccccacttaaagcacttatctttgattcagaa 
tatgatccatacagaggagtaatatcttcaattcgaattattgatggtgttgttaaagct 
ggagataggattaaaatgatggctaccggtaaagaatttgaagttacagaagtcggaatc 

10 aatacccctaagcaactaccggtagaagaattaacagttggtgatgtgggttatattatc 
gcaagtatcaaaaatgttgatgattctagagtaggtgacacaattactttagctgaaaga 
cctgctgacaaaccgttacaaggatataaaaagatgaatccaatggtattttgtggtcta 
ttccctattgacaataaagactataatgacctaagagaagctttagaaaaattacaactt 
aatgacgcatccttagagtttgaaccagagtcttcacaagcacttggttttggatacaga 

15 actggatttttaggaatgttacatatggagattattcaagaaagaattgaaagagaattt 
ggtattgaactcattgcaacagcgccttcatcccgcagaagtatcaatgcatgtgtgaac 
accattcgcttttaa 

Sequence 2296 

20 MSKI YCI KVI KTLL YQRY I EVTG YKDESEKDKMDKQERYNRRENI RNFS 1 1 AH I DHGKS T 
LADRILENTKSVETREMQDQLLDSMDLERERGTTIKLNAVRLKYEAKDGETYTFHLIDTP 

GHVDFTYEVSRSLAACEGAILVVDAAQGIEAQTLANVYLALDNDLELLPVVNKIDLPAAE 
PDRVKQELEDVIGIDQEDWLASAKSNIGIEEILEKIVDVVPAPDGDPEAPLKALTFDSE 
YDPYRGVISSIRIIDGVVKAGDRIKMMATGKEFEVTEVGINTPKQLPVEELTVGDVGYII 
25 ASIKNVDDSRVGDTITLAERPADKPLQGYKKMNPMVFCGLFPIDNKDYNDLREALEKLQL 
NDASLEFEPESSQALGFGYRTGFLGMLHMEIIQERIEREFGIELIATAPSSRRSINACVN 
TIRF* 

Sequence 2297 
30 Cont ig_07 37__pos_2 5 1 1_2 8 55, 

putative peptide of unknown function 

atgcgtagtgtaattaagttggaaaatacatttatattcataatcacaatcgccgtttat 
gttaaattagaattttctatatggctgtttttacttttactattagttccagatatattt 
atgttaggatatgtgattaatagaaaaacagggagttatgtttacaatattggacacacg 
35 tatatcacacctataattatcgcgctattatatttatacattgatgaaaggttactatta 
cagattgctttaatatggttagctcatattagtatggatagaactttaggtttcggactc 
aaatattcatcagatactgataaaacgataatacaaaagatgtaa 

Sequence 2298 

40 MRSVIKLENTFIFIITIAVYVKLEFSIWLFLLLLLVPDIFMLGYVINRKTGSYVYNIGHT 
YITPI 1 1 ALL YL YI DERLLLQI ALI WLAH I SMDRTLGFGLKYSSDTDKT I IQKM* 

Sequence 2299 

Contig_0737_pos_6151_4991, 

45 putative peptide of unknown function 

gtgttttacgacacagacagtactgaagcgatgaaaagtcatatgagtgatttagtatta 
ggcaagcaagaacaaattgcttatatcaatcagttagaacgtggacttgaagaaaataaa 
attgaaagaaactctaattctaatgagattaatcaagttgagaatgagcttgttcctgac 
gaaacctttgaaaagaaaaaggaatatacacaacaagttttagaattacatgaaaaagag 

50 aacttgtatgaaaagttaaaagaaacttttgaagaagaacaaacacaaaaaaataaaaga 
caaaagtttttgagaataggatttattgttttgactattctatcagcagcactttctatt 
ttttcttttttcactgcaaatcttatttttggtataatatttgctctattaactgtgatt 
tttgtagtaggtatcattttttctagatctaaagcagtagattatagcacagcaataagt 
caggaaattaatgatttagaaaaccaactcacgcaacttgaaaaagaatataatcttgac 

55 ttcgatttagaatatcaacaacaagttcgtgaacaatggcgtcatgctaaaaaaaataaa 
aaaatacttgaagaaaaacatcaatatatcaatcaatcattaacgactgcaaatgagcga 
ttagatagtttaaaacatagcatttttaaactactaatagctggcataatagcaggactt 
ttatctggtgtagtcaaactaggttgggaggtaatgttcccaccccgaacacctagtaga 
gatgccactaacccacctcaacaactcttacaattactaggaataccgtcaaatattacc 
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catctcacatacaatttttctgagcatgcattaccttggataagttttatcgtacactat 
agtttttctatcgctattgcaataatctatatttatatcgcaaagaaatatacaaaaatc 
acactaggttatggtgctttatttggtatagttatttggattgtttttcatttaatctta 
atgccaattatgcatgtcgtaccgaatgcttttgatcaaccattttcagaacacctatca 
5 gaattttttggacacattgtttggatgatggttatagaaatggtcagaaggtatttctat 
aatattcaattaaataaataa 

Sequence 2300 

VFYDTDSTEAMKSHMSDLVLGKQEQIAYINQLERGLEENKIERNSNSNETNQVENELVPD 
10 ETFEKKKEYTQQVLELHEKENLYEKLKETFEEEQTQKNKRQKFLRIGFIVLTILSAALSI 
FSFFTANLIFGIIFALLTVIFVVGIIFSRSKAVDYSTAISQEINDLENQLTQLEKEYNLD 
FDLEYQQQVREQWRHAKKNKKILEEKHQYINQSLTTANERLDSLKHSIFKLLIAGIIAGL 
LSGVVKLGWEVMFPPRTPSRDATNPPQQLLQLLGIPSNITHLTYNFSEHALPWISFIVHY 
SF.ST^ilAJIYIYIAKKYTKITLGYGALFGIVIWIVFHLILMPIMHVVPNAFDQPFGEHLS 
15 EFFGHirWMMVIEMVRRYFYNIQLNK* 

Sequence 2301 

Cont i g_07 3 7_pos_3 4 9 1_30 7 5 , 

putative peptide of unknown function 

20 atggaattatatagtatgcctatgtttaataaatttttagttaacgacatagataaatca 
tcggaatggtatcaagagaatttaggttttaaaagtatttttaaatttaaaaatgaacaa 
aatcaaattttaatggagcatttacgattagcaaaatatcaagatttgatgttaatttct 
ggcaaacagtttgaagtcggtaatgcagtttatacaaatatacttgtaccaaatattcga 
attttaaaacaacgaataccttctcaatatatcgtggaagatcttgaagaaaaaccatgg 

25 aattctattgaaatgacaattaaagatttagataatcatttaattacgcttacacaaagt 
aacataaaaaatgaagaatttaatgctttgatgcaacatacttcaaaaacattttaa 

Sequence 2302 

MELYSMPMFNKFLVNDIDKSSEWYQENLGFKSIFKFKNEQNQILMEHLRLAKYQDLMLIS 
30 GKQFEVGNAVYTNILVPNIRILKQRIPSQYIVEDLEEKPWNSIEMTIKDLDNHLITLTQS 
NIKNEEFNALMQHTSKTF* 

Sequence 2303 

Cone lg_r.738_pos_2962_354 0, 

35 is similar to (with p-value 4.0e-47) 

>sp:sp| P42085 |XPT_BACSCJ XANTHINE PHOSPHORIBOSYLTRANSFERASE ( 
EC 2.4.2.-). >pir :pir IS51309IS51309 xanthine phosphoribosyl t 
ransferase - Bacillus subtilis >gp: gp | L7724 6 I BACYACA_2 Bacil 
lus subtilis (YAClO-9 clone) DNA region between the serA and 

40 kdg loci. NID: gl256615. >gp:gp I Z99115 I BSUB0012_148 Bacillu 
s subtilis complete genome (section 12 of 21) : from 2195541 
to 2409220. NID: g2634478. >gp : gp I X83878 I BSXPTPBUX_1 B.subti 
lis xpt and pbuX genes. NID: g633l68. 

gtggagtcgttaggacgaaaagtcaaagaagatggcgttgtcatcgatgagaaaattttg 
45 aaggtagatggatttttaaatcatcaaattgatgcaaagttgatgaatgatgtaggtaaa 
acattttatgagtctttcaaagacgctggtattactaaaattttaactattgaagcttct 
ggtattgcgcctgctattatggcttcttttcattttgatgttccttgtctatttgctaaa 
aaagctaaacctagtactttgaaagatggcttttatagcacggatattcattcatttaca 
aaaaataaaacgagtacagtcattgtatctgaagaatttttaggtgcagacgataaagta 
50 cttatcattgatgactttttagctaatggtgatgcttcgctaggtcttaatgacattgta 
aaacaagcaaatgcgacgacagttggcgtgggtattgtggttgaaaaaagtttccaaaat 
ggtcgccaacgtttagaagatgcaggcttatatgtatcttcactttgtaaggtagcttca 
ttaaaaggcaataaggtaactcttttaggtgaagcgtaa 

55 Sequence 2304 

VESLGRKVKEDGVVIDEKILBCVDGFLNHQIDAKLMNDVGKTFYESFKDAGITKILTIEAS 

GIAPAIMASFHFDVPCLFAKKAKPSTLKDGFYSTDIHSFTKNKTSTVIVSEEFLGADDKV 
LIIDDFLANGDASLGLNDIVKQANATTVGVGIWEKSFQNGRQRLEDAGLYVSSLCKVAS 
LKGNKVTLLGEA* 
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Sequence 2305 
Cont ig_07 38_pos_35 7 9_4 8 08 , 
is similar to {with p-value 8.0e-95) 
5 >sp:sp| P42086|PBOX_BACSU XANTHINE PERMEASE. >pir : pir | S5I310 | 
S51310 xanthine permease - Bacillus subtilis >gp: gp I L77Z46 | B 
ACYACA_3 Bacillus subtilis (YAClO-9 clone) DNA region betwee 
n the serA and kdg loci. NID: gl256615. >gp : gp I Z99115 I BSUBOO 
12_147 Bacillus subtilis complete genome (section 12 of 21) : 

10 from 2195541 to 2409220. NID: g2634478. >gp : gp I X83878 I BSXPT 
PBUX_2 B. subtilis xpt and pbuX genes. NID: g633168. 
atgtatgcaggggctattcttgttcctattattgtggggacaagcttaaaattttcagct 
gaagaaattgcttatctagttactgttgatatatttatgtgcggggtagcgacatttctt 
caagcaaataaagtcacagggactggattaccgattgt act aggatgtacgtt tact gcc 

15 gttgcacctatgatactcatcggtcaaacgaaaggacttgatgttttatatggttcgctt 
ttaatatccggtatcttagttgttttaattgcaccttttttctcttatttagttaaattc 
tttccacctgttgtaacaggaagtgttgtgacaattattggaatcaatttaatgccagtt 
gcaatgaattacttggcaggtggtgaaggagcgaaaaactatggcgatactaagaattta 
atattaggtggtgttacactactcattattcttattttgcaaagatttacaaagggcttc 

20 ttgaaatcaattgcgatacttataggattagcaataggtactgctttagctggtatattt 
ggaatggttgatatcaaacaagtgggtgatgcacattggtttggtttccctgtgccattc 
agattttctggcttcggatttgatgtcagctcaatacttgtatttttcattgttgcagtt 
gtaagtttaattgaatctactggtgtctatcatgcactgagtgaaattactggtagaaaa 
ctagaaagaaaagattttcgaaaagggtacactgcggaaggtctagcaatcattttaggt 

25 tcaatatttaatgcgttcccttacactgcatattcccaaaatgtaggtcttgtttcttta 
tcaggagctaaaaagaacaatgtgatatatggaatggttattcttttactaatttgcggt 
tgtatacctaaattaggtgctttagctaatattattccattgccggttttaggtggagca 
atgatagcaatgtttggaatggttatggcatacggcgttagtattttgggtaacattaat 
ttccaaaatcaaaataatttattaattattgcaatttcagtagggttaggtgctggtatt 

30 agtgcagtacctcaagcatttaaaggattaggagaacaatttgcttggttaactcaaaat 
ggtatagtgcttggcgcaatttctgcaatcatcttaaatttcttttttaatggtataaag 
tataaacaaactgaagaaaatgtgaaataa 

Sequence 2306 

35 MYAGAILVPIIVGTSLKFSAEEIAYLVTVDIFMCGVATFLQANKVTGTGLPIVLGCTFTA 

VAPMILIGQTKGLDVLYGSLLISGILVVLIAPFFSYLVKFFPPVVTGSWTIIGINLMPV 
AMNYLAGGEGAKNYGDTKNLILGGVTLLIILILQRFTKGFLKSIAILIGLAIGTALAGIF 
GMVDIKQVGDAHWFGFPVPFRFSGFGFDVSSILVFFIVAVVSLIESTGVYHALSEITGRK 
LERKDFRKGYTAEGLAIILGSIFNAFPYTAYSQNVGLVSLSGAKKNNVIYGMVILLLICG 
40 CI PKLG ALAN 1 1 PL PVLGGAMI AMFGMVMA YG VS I LGN I N FQNQNNLL 1 1 AI S VGLGAGI 
SAVPQAFKGLGEQFAWLTQNGIVLGAISAIILNFFFNGIKYKQTEENVK* 

Sequence 2307 

Cont ig_0 7 3 8_pos_4 8 4 6_6 3 1 2 , 

45 is similar to (with p-value O.Oe+00) 

>sp:sp| F2187 9|IMDH_BACSU INOSINE-5 ' -MONOPHOSPHATE DEHYDROGEN 
ASE (EC 1.1.1.205) (IMP DEHYDROGENASE) (IMPDH) (IMPD). >pir: 
pir|S12623|DEBSMP IMP dehydrogenase (EC 1.1.1,205) - Bacillu 
s subtilis >gp:gp|X55669l BSIMPDE_1 Bacillus subtilis guaB ge 

50 ne for IMP dehydrogenase. NID: g39958. 

atgtgggaaaataaatttgctaaagaatctttaacattcgacgacgtgttactcattcca 
gctgcatcagatgttttaccaagcgatgttgacttaagtgtcaaattatcagataagatc 
aagttaaacattcctgttatctcagcaggtatggatacagtaactgaatcaaaaatggca 
attgctatggctcgacaaggcggtttaggtgttattcataagaatatgggcgtcgaagag 

55 caagctgatgaggtacaaaaggttaaacgttcagaaaatggtgttatttctaacccgttc 
ttcttaacaccggaagaaagtgtgtatgaggctgaagcattaatgggtaaataccgtatc 
tctggtgtacccattgtcgataatcaagaggatcgcaagttgattgggattttaacaaat 
cgtgatttacgttttattgaagatttttcaattaaaatatcagatgtaatgacgaaagat 
aatttaataacagctccagttggtacgactttagatgaagccgaggctattcttcaaaaa 
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cataagattgagaaacttccattagtagaaaatggtcgtttagaaggattaatcactatt 
aaagatattgaaaaagtacttgaattcccatatgcagctaaagatgaacatggcagattg 
ttagctgcggcagcaatcggtacgtctaaagatactgaaattcgtgcacaaaaactagtt 
gaagctggcgtagatgcattaattattgatacagctcatggtcattctaaaggcgttatt 

5 aatcaagttaaacacatcaaggaaacatatcctgaaattactgttgtcgctggtaacgta 
gcgactgcagaggcaacacgtgctttatttgaagcgggtgccgatgttgttaaagtaggt 
attggtccaggctcaatttgcacaacacgtgttgttgcaggtgtaggtgtgcctcaaatt 
acagcagtttatgattgtgctacagaagcccgtaagcatggtaaggctattattgctgat 
ggtggtattaagttctcaggtgatattatcaaagcattagctgctggtggtcatgcggtt 

10 atgttaggtagtttgttagctggtacagaagaaagtcctggtgcaactgaagtattccaa 
ggtagacaatataaagtttatcgcggcatgggatctttaggtgctatggaaaaaggttca 
aatgatcgttacttccaagaagataaaacaccaagaaaatttgttcctgaaggtattgaa 
ggtcgtacagcttataaaggaccattacaagatacaatttatcaacttatgggtggcgtt 
agagctggcatgggttatactggttcagaaaacctaaaaaaattacgtgaagaagcacaa 

15 tttacacgtatgggaccagctggcttagctgaaagtcatcctcataatgttcaaattacg 
aaagaatcaccaaactattct ttctag 

Sequence 2308 

MWENKFAKESLTFDDVLLIPAASDVLPSDVDLSVKLSDKIKLNIPVISAGMDTVTESKMA 
20 lAiSlARQGGLGVIHKNMGVEEQADEVQKVKRSENGVISNPFFLTPEESVYEAEALMGKYRI 
SGVPIVDNQEDRKLIGILTNRDLRFIEDFSIKISDVMTKDNLITAPVGTTLDEAEAILQK 
HKIEKLPLVENGRLEGLITIKDIEKVLEFPYAAKDEHGRLLAAAAIGTSKDTEIRAQKLV 
EAGVDALIIDTAHGHSKGVINQVKHIKETYPEITVVAGNVATAEATRALFEAGADVVKVG 
IGPGSICTTRWAGVGVPQITAVYDCATEARKHGKAIIADGGIKFSGDIIKALAAGGHAV 
25 MLGSLLAGTEESPGATEVFQGRQYKVYRGMGSLGAMEKGSNDRYFQEDKTPRKFVPEGIE 
GRTAYKGPLQDTIYQLMGGVRAGMGYTGSENLKKLREEAQFTRMGPAGLAESHPHNVQIT 
KESPNYSF* 

Sequence 2309 
30 Contig_0738_pos_6473_0, 

is similar to (with p-value 5 . Oe-.94 ) 

>gp:gp|U51115|BSU51115_8 Bacillus subtilis CotA (cotA) , GabP 
(gabP), YeaB (yeaS), YeaC (yeaC) , YebA (yebA), GMP syntheta 
se (guaA) genes, complete cds, and AIR carboxylase I (purE) 
35 gene, partial cds. NID: g2239287. >gp:gp|Z99107|BSUB0004 83 
Bacillus subtilis complete genome (section 4 of 21): from 60 
0701 to 813890. NID: g2632866. 

atgactatggaaatggcgaaagagcaagagctgattcttgttttagactttggtagccaa 
tataaccagttaattacgcgtcgtatccgt gaga tgggcgtttatagtgaattacat gat 

40 cacgaaatttctattgaagaaatt aaacgaatgaatcctaaaggtatcattctttcaggt 
ggtccaaattcagtgtatgaagaggggtcatttaccatcgaccctgaaatttacaattta 
ggtattccagttttaggtatatgttatggtatgcaattaacgactaagcttttaggtggt 
aaagttgagcgtgccaatgagcgtgaatatggcaaagctacaattaacgctaaatcagat 
gaacttttctttggcttaccttctgaacaaacagtatggatgagtcattctgataaagta 

45 attgaaattcctgaaggatttgaagtgattgcagatagtccaagtactaattatgcagct 
attgaagataaaaaacgtcgcatttacggtgtacaattccatccagaagtacgtcacact 
gaatatgggaacgacttactaagaaacttcgtccgccgtgtttgtaattgtacaggtgaa 
tggacgatggagaatttcattgaaattgaaattgagaaaatccgtcaacaagtaggtaat 
cgtaaagtattatgtgcaatgagtggtggagtagattcatccgtagttgctgtactttta 

50 cataaggcaatcggcgaccaattaacatg 

Sequence 2310 

MTMEMAKEQELILVLDFGSQYNQLITRRIREMGVYSELHDHEISIEEIKRMNPKGIILSG 
GPNSVYEEGSFTIDPEIYNLGIPVLGICYGMQLTTKLLGGKVERANEREYGKATINAKSD 
55 ELFFGLPSEQTVWMSHSDKVIEIPEGFEVXADSPSTNYAAIEDKKRRIYGVQFHPEVRHT 
EYGNrLI-r».NFVRRVCNCTGEWTMENFIEIETEKIRQQVGNRKVLCAMSGGVDSSVVAVLL 
HKAIGDCLTX 

Sequence 2311 
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Contig_0738_pos_2368_1961, 

putative peptide of unknown function 

atggcaaaaattactgtagtgaataaccaagatgaattatataaagtcatcaatcaaaaa 
aaatctgaaggttatttagagacagaattagctgttatcagtaaaagtaagttgcactta 
5 gatgatttacacaactctcaaatctcgttaatggctacaagtggctcatttagtgaccgt 
atgtctcgtttacttacaggtgaagatggagaagaaacagtattatctcgttatgattta 
actgacaatgaactagaaggatataaacaagatattttaaacgataaaatgctcgttgtt 
gc'.aai-aqtgaccgttcttctcatgatgaagttgaagataataatgctgcatataridgaa 
gtggat£.ntactcattatgccgcagagtctgaagggcctaaagcataa 

10 

Sequence 2312 

MAKITVVNNQDELYKVINQKKSEGYLETELAVISKSKLHLDDLHNSQISLMATSGSFSDR 
MSRLLTGEDGEETVLSRYDLTDNELEGYKQDILNDKMLVVANSDRSSHDEVEDNNAAYKE 
VDITHYAAESEGPKA* 

15 

Sequence 2313 

Contig_0738_pos_1833_1282, 

putative peptide of unknown function 

atggaatttaaagtgatagaatcagctaaagatccattatttaacgaggcactcaaatta 
20 tatgatgataaattggatattggtttagatgaagatagtaaaatttttaaacgctcactt 
gaaaataataaaacagaaaatgattacgcctttatcgttggaattgaaaaccagactgta 
gttagcttagcaactgcacattacgaagcaacaaccaattctgcatttttaatttactta 
atcgcaaaagaaagccccaatcatgatgaaagaatgtccttaactttagaggcaatagaa 
aaacaattaaaccttt tat cacaagaagt teat aatagagatattaatt tea tcatgtta 
25 gaagttccaaaagaaccctcgactgctaacatcgatgacaagctcagaaatgcactagaa 
catcgtcgtcaattcctctttgaaaatcaatttgaaaagcaggacgacattgactatatt 
catccaaaccaaaaccaaaaggaaacgccacaaaaagtagatttatttattaaagcr.aac 
attg.?dtt,atag 

30 Sequence 2314 

MEFPTVIESAKDPLFNEALKLYDDKLDIGLDEDSKIFKRSLENNKTENDYAFIVGIENQTV 
VSLATAHYEATTNSAFLIYLIAKESPNHDERMSLTLEAIEKQLNLLSQEVHNRDINFIML 
EVPKEPSTANIDDKLRNALEHRRQFLFENQFEKQDDIDYIHPNQNQKETPQKVDLFIKAN 
lEL* 

35 

Sequence 2315 

Contig_0741_pos_1807_2565, 

putative peptide of unknown function 

atggatgacttgaaacaaaatcaatcttctaacgaaaaacctaaaggtaataaaataatt 
40 aatattttgatattcatcggaatgattttattaattcaaatacctattggcgtgtcacta 
atagctttacctttttcagtgaaattcagtaagttaacatccatcgcattaagtatgcta 
ataactggtacagcactattaatcatatggttagttaggaattattatttgagtcataca 
tatgaaagacaatatcaatcaatgaggggaaaagatatctttattaatattggttttctg 
gtattatcaatggtttttagtattctaagtagtgtattaatggtcatatttactggcaac 
45 gatactacagcgaatgagaaagaaatcaatgaaagtttagatttacttttacaaaaagac 
catttaccacatatttcaattgttgcaactgttgttttaatgatatgtattataggtccg 
tat ttagaggaattactcttccgaggaatttttaaagaaacattatttatgaaatatcga 
ttttc-gcl-accattcattatatcttctattatttttagttcacaacatttatcaa . aaat 
atattttcatatgcaatttattttctaatgggttgtgtattataccttgcctataacaga 
50 agacgtaatatcaaagatagtatgatggttcacatgttgaataattctgtttcaacatta 
ccggtatttgttggttatttatggctatattttagatag 

Sequence 2316 

MDDLKQNQSSNEKPKGNKIINILIFIGMILLIQIPIGVSLIALPFSVKFSKLTSIALSML 
55 ITGTALLIIWLVRNYYLSHTYERQYQSMRGKDIFINIGFLVLSMVFSILSSVLMVIFTGN 
DTTANEKEINESLDLLLQKDHLPHISIVATVVLMICIIGPYLEELLFRGIFKETLFMKYR 
FWLPFIISSIIFSSQHLSTNIFSYAIYFLMGCVLYLAYNRRRNIKDSMMVHMLNNSVSTL 
PVFVGYLWLYFR* 
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Sequence 2317 

Con t ig_07 4 l_pos_3 8 9 6_4 231, 

putative peptide of unknown function 

gtgtatatcatgtatgagaatatacaaaatattgaagatttttaccgatttataaatgca 
5 cacgctttagctgttgttcatattatgagagataattgtacagtatgtcatgcagtatta 
ccccaaattcaagatttactaaaggactatccgaaagcacaattaggtgtgattaatcaa 
tctaatgttgaagctattgccggagaactttctatttttacagtacctgttgatttaatt 
tttttgaaagggaaagaaatgcatagacaagcacgttttatcgatatgcaatcgtttgaa 
aaacaattgtatataatgcaaaatgccatcgattaa 

10 

Sequence 2318 

VYIMYENIQNIEDFYRFINAHALAVVHIMRDNCTVCHAVLPQIQDLLKDYPKAQLGVINQ 
SNVEAI AGELS I FTV PVDL I FLKGKEMHRQARFI DMQS FEKQL Y I MQNAI D* 

15 Sequence 2319 

Contig_0741_pos_4 399_52 62, 

is similar to (with p-value l.Oe-31) 

>sp:sp|P42978| YPJC_BACSU HYPOTHETICAL 23.6 KD PROTEIN IN QCR 
C-DAPB JNTERGENIC REGION. >gp:gp| L38424 | BACJ0JC_1 Bacillus s 

20 ubtiiii* dihydropicolinate reductase (jojE) gene, complete cd 
s; poly (A) polymerase (jojl) gene, complete cds; biotin acet 
yl-CoA-carboxylase ligase (birA) gene, complete cds; jojC, j 
03D, jojF, jojG, jojH genes, complete cds * s . NID: g755600. > 
gp:gp[L4 7709|BACYPIA_9 Bacillus subtilis {clone YAC15-6B) yp 

25 iABF genes, qcrABC genes, ypjABCDEFGHI genes, birA gene, pan 
BCD genes, dinG gene, ypmB gene, aspB gene,.asnS gene, dnaD 
gene, nth gene and ypoC gene, complete cds's. NID: gl 14 6223. 

>gp:gp| Z99115 I BSUB0012_191 Bacillus subtilis complete genom 
e (section 12 of 21): from 2195541 to 2409220. NID: g2634478 

30 

atgaaggaggccaaaatgaagaagcaacagaataaagtccatatgattaatatatcttta 
tgccttatagggacattacttattgcaatcgcagtaaatagttttgttataccgggcaat 
ttaggtgagggtggttctataggcctttcattaattctgaattatactctaggtatttca 
cccgcactcagttcct teat cattaacgcaatattaat tat tgtaggttggaaattt etc 

35 agtagaacgacagcgatttacactgctataacaatcaccgcaagttcaatttttcttgat 
ttaactcacacattcggattaggtattcatgataattttattaattcaatttttgcaggt 
ttaatgttgggaatcggttctgggttagtaattactgctcatagtacattgggtggtaca 
tctgtcattgcacgta teat ttcaaaatatagtgagatgaagacgtcacaagcact act t 
atatt ag<Ltgcaatcatcgtgttatcatttatcgttgttttacctataacgaatgtatta 

40 tatactattgtgatgctatttattgttgaaaaatcaatgtcttttgttgttgaaggattt 
aatcctaaaaaagctgtgacagttatttcaaaatataataaagaaatcagtgctgatata 
tatgaaatgactggaagaggggcaaccttattaagtggtaaaggtgcttaccaaaaaagt 
gatacagaagttctatatgccgtggtatcccaaaatcaagttggagcaataaaaaagatt 
gttaatcaatatgatgaaaatgcctttttagtgattcatgatgtgcgtgatgtcttaggt 

45 aatggatttattaatattaaataa 

Sequence 2320 

MKEAKMKKQQNKVHMINISLCLIGTLLIAIAVNSFVIPGNLGEGGSIGLSLILNYTLGIS 
PALSSFIINAILIIVGWKFLSRTTAIYTAITITASSIFLDLTHTFGLGIHDNFINSIFAG 
50 LMLGIGSGLVITAHSTLGGTSVIARIISKYSEMKTSQALLILDAIIVLSFIVVLPITNVL 
YTIVMLFIVEKSMSFVVEGFNPKKAVTVISKYNKEISADIYEMTGRGATLLSGKGAYQKS 
DTEVLYAVVSQNQVGAIKKIVNQYDENAFLVIHDVRDVLGNGFINIK* 

Sequence 2321 
55 Contig_0741_pos_74 60_8320, 

is similar to (with p-value 8.0e-77) 

>gp:gp| Z92954 |BSZ92954_1 B. subtilis yws [A, B, C, D, E, F, G) and g 
erBC genes. NID: gl894764. 

gtggttac;acaaaaagcaaatgccctagttaatgagtgtatggctgtaaatccagoCtat 
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caaattacctttcaaaatgatttagtaaaagcaaatatcggtgtaattgttaatgtgatg 
gaagaccatatggatgtcttaggaccgacacttaaagatgtagcgcaagcttttactgca 
acaattccatataacgggaaattagttgtaatgaaagataactatactagtttctttgca 

aaggaagctaaaaagcgtaattcagaactcattgttgtagataaagacgtcataccagaa 
5. tcatatttacggaagttcgattatttagtatttcctgataatgtagctattgtgttagga 
atagcgcaagcagttggtgtagatgaagaaactgcattacaaggtatgttaaatgcacca 
gccgatccaggtgctgttagaattaaatatttccatgcaaatcgcacaaaaaatgtattt 
gttaatgcattcgctgctaatgaaccgcagtctacaaaagcgattttaaataaagtggaa 
tcatataattatccatacgataagaaaataatcattctcaattgtcgttcagatagggtt 
10 gatagaacacaactctttgttgataactttttaggtgaagtcgattacgatgttctcatt 
tgtacagrjaaaaagtacacaaatggtgacacagtttatggaaactatgccagaaaaaaca 
tatatcaattatgaaggacgagactttgtagagattgaaaaaggtattctacatgaagct 
gagaatgcacttgtattttgtgtaggaaacatccacggcccgggtggtagaatagcggaa 
ttcatagaagggatagaataa 

15 

Sequence 2322 

VVKQKANALVNECMAVNPDYQITFQNDLVKANIGVIVNVMEDHMDVLGPTLKDVAQAFTA 
TIPYNGKLVVMKDNYTSFFAKEAKKRNSELIVVDKDVIPESYLRKFDYLVFPDNVAIVLG 
lAQAVGVDEETALQGMLNAPADPGAVRIKYFHANRTKNVFVNAFAANEPQSTKAILNKVE 
20 SYNYPYDKKIIILNCRSDRVDRTQLFVDNFLGEVDYDVLICTGKSTQMVTQFMETMPEKT 
YINYEGRDFVEIEKGILHEAENALVFCVGNIHGPGGRIAEFIEGIE* 

Sequence 2323 

Contig_074 l_pos_8322_87 74 , 

25 is similar to (with p-value 2.0e-43) 

>gp:gplZ99122|BSUB0019_86 Bacillus subtilis complete genome 
(section 19 of 21): from 3597091 to 3809700. NID: g2636029. 
>gp:gpjZ92954 |BSZ92954_2 B. subtilis yws ( A, B, C, D, E, F, G] and g 
erBC genes. NID: gl894764. 

30 atnataggttcagaattatatttctccttattcgtaggtgtcgtactcagtttgatattt 
gctgagaaatttgggattaatccagcagggttagtcgttccaggttatttagctt5:gatt 
tttgatcaaccgatcatgttgttatcagtattaatcattagttgcttaacttattttatc 
gtaagcaacggtattagtaagtgggttattttatatggtagaagaaaattcgctgccatg 
atactgacgggaatggtgattaaatttatatttgatctcttgtacccattgaccccattt 

35 gaaatggttgaagtttcaggtataggtgttgtcattcctggtattattgcgaatacaatt 
caaaaacaaggtgtagtcattacactttctacaacaatgttattaacatgtattacatat 
atcatcttatttttatatagttttattaattaa 

Sequence 2324 

40 MIGSELYFSLFVGWLSLIFAEKFGINPAGLVVPGYLALIFDQPIMLLSVLIISCLTYFI 
VSNGISKWVILYGRRKFAAMILTGMVIKFIFDLLYPLTPFEMVEVSGIGWIPGIIANTI 
QKQGVVITLSTTMLLTCITYIILFLYSFIN* 

Sequence 2325 
45 Contig_0741_pos_B7 98_^9400, 

putative peptide of unknown function 

atgacaaagaaaaaacgtttatcgcctagtgagtggttgcttaaacaatctaaaagacat 
aaaaggaaaaatacactttacacggcaattgtacttttagtagcgttagttctactcata 
tttgctgttaaatcaatacaagtagaacctgtaaaaagtgatacgagagacaaagat.agc 

50 attc,ytf=:icacctatttaggtaacgtcactttaaataaacatattcgacaaactaacttg 
aatgatgtttttaaaggtattcaagatactttagatcatagtgatttttcaacagyttca 
ttaatagtaaatgatttttcaagaaatcaaaaagataacataaataaaaatattgaaaat 
atcatgtttctacgcaagcataatgttaaaagtgttaacttaatcaacgaatctatggat 
aatattcaagcgacagcaatgatgagaaaaatagattcccaagcaggttataatttttta 

55 acaggtaatggttcaaatccaattaatagtaaaactgtacaacaagacattaaaggtaaa 
aaaatagctaaaacataccatattgttgcagactatctaattgatgtaaatccggattgt 
taa 

Sequence 2326 
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mtkkkrlspsewllkqskrhkrkntlytaivllvalvllifavksiqvepvksdtrdkds 
ir:-.tvlgmvtlnkhirqtnlndvfkgiqdtldhsdfstgslivndfsrnqkdninknien 

IMFLKKKWVKSVNLINESMDNIQATAMMRKIDSQAGYNFLTGNGSNPINSKTVQQjIKGK 
KIAKTYHIVADYLIDVNPDC* 

5 

Sequence 2327 

Contig_0741_pos_104 00_10837, 
putative peptide of unknown function 

atgagcattgaatttagacatcaatcatggtttgacaatcagtataaagaacaaacttta 
10 tccttcttaacacaacatcaaatcattcatgcagtggtagatgaacctcaagttaaagag 
gggagcgttcctttagtaaataggattactagtgaaattgcttttgtacgttatcatgga 
cgtaatcattatggttggactaaaaaagatatgactgatcaagaatggcgagatgtaaga 
tatttatatgattatagcgatgatgagttagctgacttggctcgtaaagtcgaaatactt 
aatcaaaaggctaagaaagtatatgtaatttttaataataactctggcggtcatgcagct 
15 aataatgctaaaaagtatcaaaatattttagacattgattatgaaggtttagcaccgcaa 
caattaaaactattttaa 

Sequence 2328 

msiefrhqswfdnqykeqtlsfltqhqiihawdepqvkegsvplvnritseiafvryhg 
20 rnhygwtkkdmtdqewrdvrylydysddeladlarkveilnqkakkvyvifnnnsgghaa 

NNAKK YQN I L D I D YEGLA PQQLKL F* 

Sequence 2329 

Cont ig_07 4 l_pos_l 118 9_0 , 

25 putative peptide of unknown function 

gtgtttatgatatttgtgtcgatattactgatgattcgtcacaaaatcaaaccttttaaa 
atttttgacaaacctaaatatgcgcgtacatatgttgatgctgaagggaaaacataccgt 
tatagtgtaccacccctgtttgcttttataacaacgttatttattgggctattaacagga 
ctgtttggcataggtggaggtgcattgatgacccctcttatgctcatcgtctttagattt 

30 ccaccacatgttgcagtaggcacaagtatgatgatgattttcttttcaagtgtgatgagt 
tcaatagggcacatctttcaaggacatgtggcttggggctattctatcattctcattatt 
tcaagtgttataggtgcacaaattgaggacttcatttctcccacatcaggattgaaatat 
gagcgccatcaatataacttatacggccaacttcatagaaaagaatattattatgatgac 
tcttctt 

35 

Sequence 2330 

vfmifvsillmirhkikpfkifdkpkyartyvdaegktyrysvpplfafittlfiglltg 
lfgigggalmtplmlivfrfpphvavgtsmmmiffssvmssighifqghvawgysiilii 
ssvigaqiedfisptsglkyerhqynlygqlhrkeyyyddssx 

40 

Sequence 2331 

Cont ig_0 7 4 l_po s_9 8 7 5_9 3 4 5 , 

is similar to (with p-value l.Oe-58) 

>sp:sp|P23920|SYM_BACST METHIONYL-TRNA SYNTHETASE (EC 6.1.1. 

45 10) (METHIONINE--TRNA LIGASE) (METRS) . >pir : pir | SI 6682 I SI 668 
2 methionine — tRNA ligase (EC 6.1.1.10) - Bacillus stearothe 
rmophilus >gp:gp I X57925 | BSMETSG_1 B . stearothermophilus metS 
gene for methionyl-tRNA synthetase. NID: g39988. 
gtggtaagagattacttaatgcgtgaattaccgtttggttctgatggcgtatttacaccg 

50 gaagcctttgttgaaagaacaaattacgatcttgcgaatgatttaggtaatctagtgaat 
cgtactatctctatgataaacaaatatttccacggcgaattacctgcataccaaggtcca 
aaacatgaattggatgaaaaaatggaagcgatggcgcttgaaactgttaaatcattcaat 
gataatatggaaagtttacaattttctgttgctttatcaacagtatggaaatttattagt 
cgtacaaacaaatatattgatgaaactcaaccttgggttcttgcaaaagatgaaaatcaa 

55 cgtgagatgcttggtaatgtaatggcacatcttgtcgagaacattcgtttcgctacaatc 
ttattacaaccattcttgacgcatgcacctagagagatatttaagcaacttaatattaac 
aatccggatttacatcaattagatagtctgcaacaatatggtatgttttag 

Sequence 2332 



574 



wo 01/34809 



PCT/USOO/30782 



VVRDYLMRELPFGSDGVFTPEAFVERTNYDLANDLGNLVNRTISMINKYFHGELPAYQGP 
KHELDEKMEAMALETVKSFNDNMESLQFSVALSTVWKFISRTNKYIDETQPWVLAKDENQ 
REMLGNVMAHLVENIRFATILLQPFLTHAPREIFKQLNINNPDLHQLDSLQQYGMF* 

5 Sequence 2333 

Contig_0741_pos_6714_6397, 

putative peptide of unknown function 

atggagaagtcagttaaacttgctgttggcatctatctagcaattattttaattatttgt 
agtatttacctcgcatttatacttattggaagcttaaatggtaaagacatgagtaattct 
10 gttttagatactgatcactctcgtatcaacaatacttcaagaaacagtaacgaagatgtt 
acgtcatcaaataatgagtcaaacaatacaaaagcgcactcatttgcaaactctgaatat 
aaagctattaacataaacgaagcatttaaaaataataagcaaattaaaaaagcgaattcg 
agttatcaatactattga 

15 Sequence 2334 

MEKSVKLAVGIYLAIILIICSIYLAFILIGSLNGKDMSNSVLDTDHSRINNTSRNSNEDV 
TSSNNESNNTKAHSFANSEYKAININEAFKNNKQIKKANSSYQYY* 

Sequence 2335 
20 Contig_0741_pos_2363_2061, 

putative peptide of unknown function 

atgaatggtagccaaaatcgatatttcataaataatgtttctttaaaaattcctcggaag 
agtaattcctctaaatacggacctataatacatatcattaaaacaacagttgcaacaatt 
gaaatatgtggtaaatggtctttttgtaaaagtaaatctaaactttcattgatttctttc 
25 tcattcgctgtagtatcgttgccagtaaatatgaccattaatacactacttagaatacta 
aaaaccattgataataccagaaaaccaatattaataaagatatcttttcccctcattgat 
tga 

Sequence 2336 

30 MNGSQNRYFINNVSLKIPRKSNSSKYGPIIHIIKTTVATIEICGKWSFCKSKSKLSLISF 
SFAVVSLPVNMTINTLLRILKTIDNTRKPILIKISFPLID* 

Seqj'inco 2337 

Contig_074 3_pos_1628_1969, 

35 putative peptide of unknown function 

atgttcacaatcaaacctgtagaaatagctacaactaaaatggcgataacaaatgagact 
acaaattttttcttgaatagtttcgataataatacaacttctggaatacttgcaccagca 
ccaccaatgatcaaagcaacaactgtcccaagcgacattccttttgaagctaatgcttca 
gctataggtaacattgtttcaggtctaatatacattggaatgcctataacagatgcaatg 

40 aatacagatataacgccatcaccacttgcgtattttgtaataaatgtttcgggtacaaag 
ccatatatgaacgctccaataaacacaccaataaataggtaa 

Sequence 2338 

MFTIKPVEIATTKMAITNETTNFFLNSFDNNTTSGILAPAPPMIKATTVPSDIPFEANAS 
45 AIGNIVSGLIYIGMPITDAMNTDITPSPLAYFVINVSGTKPYMNAPINTPINR* 

Sequence 2339 

Cont i g_0 7 4 3_pos_3 3 8 9_3 7 7 8 , 

is similar to (with p-value 2.0e~16) 

50 >gp:gp|AF016485|AF016485_124 Halobacteriura sp. NRC-1 plasmid 
pNRClOO, complete plasmid sequence. NID: g2822278. 
atattaatctataatagcaaattaattaatgaggtgagtataatgttaaatattga.'iata 
tacg3d<:iAagctatgtgctgttctactggagtatgtggtccagaaccggatgaaa.;acta 
ataaaagcgaaccaaatcaatgaatatttaaagcaaaatcaaatagaagttcaacgttat 

55 aatatgaataataatccgaatgaattcattaaaaatcaagaagttattcgtttaattcaa 
gaaaaaggtgatgaagttttaccaatcacttttatagaaggcggtatagctaaaacgggc 
gcttatat tacccaagaagaagccgatgaaattattacagttaatcaaatgagaaatgga 
ggatgctgtggtggagatggatgctgttaa 
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Sequence 2340 

MLIYNSKLINEVSIMLNIEIYEEAMCCSTGVCGPEPDETLIKANQINEYLKQNQIEVQRY 
NMNNNPNEFIKNQEVIRLIQEKGDEVLPITFIEGGIAKTGAYITQEEADEIITVNQMRNG 
GCCGGDGCC* 

5 

Sequence 2341 

Cont ig_07 4 3_pos_4 2 4 8_4 67 6, 

putative peptide of unknown function 

atgcttgaattaccttctgcatggacagattatttaaatacaacgagtaatgacgcttct 
10 tgcttaggtcaattatcaggtttaaatgaaaatagagttaaatataattcagcacttgaa 
aaactacgtaaccaagatgatacgaccatgatgttagttgcgagacctactcactcttct 
atatatgaaattcaaagagcgcaacaagaattcatcatccactggtggtgtgactacttc 
gcctgtatcaggatttttaactcctggtttacctggaacgtcctcttggctacctttcgg 
tgcatttggatcaaattcatccttatggcctggcttgatttcttcgccaccataatgaac 
15 gatttcatctactggttgttttgttactttttctgttggttcaccttcgccaactttttc 
ccctgttaa 

Sequence 2342 

MLELPSAWTDYLNTTSNDASCLGQLSGLNENRVKYNSALEKLRNQDDTTMMLVARPTHSS 
20 lYEIQRAQQEFIIHWWCDYFACIRIFNSWFTWNVLLATFRCIWIKFILMAWLDFFATIMN 
DFIYWLFCYFFCWFTFANFFPC* 

Sequence 2343 

Cor: t i g_0 7 4 3_pos_5 2 4 2_4 4 0 6 , 

25 putative peptide of unknown function 

gtggatgatgtgacaaaatatggtccagttgatggagatccgattacgtcaacggaagaa 
attccgtttgataaaaaacgcgaatttgatccaaacttagcgccaggtacagagaaagtc 
gttcaaaaaggtgaaccaggaacaaaaacaattacaacaccaacaactaagaacccatta 
acaggggaaaaagttggcgaaggtgaaccaacagaaaaagtaacaaaacaaccagtggat 

30 gaaatcgttcattatggtggcgaagaaatcaagccaggccataaggatgaatttgatcca 
aatgcaccgaaaggtagccaagaggacgttccaggtaaaccaggagttaaaaaccctgat 
acaggcgaagtagtcacaccaccagtggatgatgtgacaaaatatggtccagttgatgga 
gatccgatcacgtcaacggaagaaattccgtttgataaaaaacgcgaatttgatccaaac 
ttagcgccaggtacagagaaagtcgttcaaaaaggtgaaccaggaacaaaaacaattaca 

35 acaccaacaactaagaacccat taacaggggaaaaagttggcgaaggtgaaccaacagaa 
aaagtaacaaaacaaccagtagatgaaatcgttcattatggtggcgaagaaatcaagcca 
ggccataaggatgaatttgatccaaatgcaccgaaaggtagccaagaggacgttccaggt 
aaaccaggagttaaaaatcctgatacaggcgaagtagtcacaccaccagtggatgatgaa 
ttcttgttgcgctctttgaatttcatatatagaagagtgagtaggtctcgcaactaa 

40 

Sequence 2344 

VDDVTKYGPVDGDPITSTEEIPFDKKREFDPNLAPGTEKVVQKGEPGTKTITTPTTKNPL 
TGEKVGEGEPTEKVTKQPVDEIVHYGGEEIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPD 
TGEVVTPPVDDVTKYGPVEX^DPITSTEEIPFDKKREFDPNLAPGTEKVVQKGEPGTKTIT 
45 TPTTKNPLTGEKVGEGEPTEKVTKQPVDEIVHYGGEEIKPGHKDEFDPNAPKGSQEDVPG 
KPGVKNPDTGEWTPPVDDEFLLRSLNFIYRRVSRSRN* 

Sequence 2345 

Cont ig__07 4 3_pos_2 507_1 623 , 

50 putative peptide of unknown function 

atgttagattcaattatggaatttataaaaacatttgtgatgttgttttttgaattatta 
atactattcatcatcgtgagctttattgtaagtataatccagcagatagtttcagaagaa 
aaaataaaacactttttaagtaaacctaatcaagctattaattatattttgggaatggct 
tttggtgcgatgacaccattttgttcttgttctacaattcctatacttgcaggtttatta 

55 aattctaaagttccatttggccctgcaatgagttttttaattgcgtcacctttaatgaat 
ccattaatgatatttatgttatgggccttattaggttggaaagttgctgttgtttacttt 
attttactagcactctttagtgtcttaacaggtctagtattttcaaaaatgaatttagct 
gaaacttataaaggagtaaatgttaaaggcgatggattttttgctaataaaatgggatct 
cgttttaaacaagcattaaatgatgcgtgggcatttttatatccaatgcttccttaccta 
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tttattggtgtgtttattggagcgttcatatatggctttgtacccgaaacatttattaca 
aaatacgcaagtggtgatggcgttatatctgtattcattgcatctgttataggcattcca 
atgtatattagacctgaaacaatgttacctatagctgaagcattagcttcaaaaggaatg 
tcgcttgggacagttgttgctttgatcattggtggtgctggtgcaagtattccagaagtt 
5 gtattattatcgaaactattcaagaaaaaatttgtagtctcatttgttatcgccatttta 
gttgtagctatttctacaggtttgattgtgaacatcgttatttaa 

Sequence 234 6 

MLDSIMEFIKTFVMLFFELLILFIIVSFIVSIIQQIVSEEKIKHFLSKPNQAINYILGMA 
10 FGAMTPFCSCSTIPILAGLLNSECVPFGPAMSFLIASPLMNPLMIFMLWALLGWKVAVVYF 
ILLALFSVLTGLVFSKMNLAETYKGVNVKGDGFFANKMGSRFKQALNDAWAFLYPMLPYL 
FIGVFIGAFIYGFVPETFITKYASGDGVISVFIASVIGIPMYIRPETMLPIAEALASKGM 
SLGTVVALIIGGAGASIPEVVLLSKLFKKKFVVSFVIAILVVAISTGLIVNIVI* 

15 Sequence 2347 

Con 1 1 3 _0 7 4 4_pos_3 5 3 0_4 7 0 5 , 

is similar to (with p-value 2.0e-71) 

>sp:sp|P39643 |AAT2_BACSU PROBABLE ASPARTATE AMINOTRANSFERASE 
(EC 2.6.1.1) (TRANSAMINASE A) (ASPAT) . >pir : pir I S39740 | S397 
20 40 hypothetical protein - Bacillus subtilis >gp : gp I X73124 | BS 
GENR_86 B. subtilis genomic region (325 to 333). NID: g413923 
• >gp:gplZ99123lBSOB002O_65 Bacillus subtilis complete genom 
e (section 20 of 21): from 3798401 to 4010550. NID: g2636240 

25 atgaggaggcatagcatggaaatgtcagaaaggctagcttcaattcctgatagctacttt 
ggcaaaacaatgggccgtatagttgaacatggtcctttaccacttataaatatggcagtt 
ggaattccagatggagaaacgccaaagggtattatcaatcatttttcagaggcgctatgt 
attccagaaaatcaaaagtatggtccatttcacggcaaagatgcctttaaacaagctatt 
gttaacttctaccaaagacattacgatgttgaattagacaaagaagatgaagtttgtatt 

30 ttatatgggactaaaaatggtcttgttgcattacctacttgtgttgttaatcctggtgaa 
attgtacttttacctgatccgggatatacagattatttagcgggggtcatgttagctgat 
gctaagccactccctttaaaattgtcgccaccaaattatttgccgaattggaatactata 
agtgctaaagttcttgagaagactaagctaatttatttaacatatcccaataatcctacc 
ggttcgacagcgacacaagatgattttgatgaagcgattcatcgttttaaaggtactcaa 

35 acaaagatagttcatgactttgcatatagtgcttttggatttgacgccaaaaatccaagc 
atattagcttctaaaaatgcaaaagatgttgctatcgagatattctctttatctaaaggt 
tataatatgtcaggctttcgtgttgggtttgctgttggtaataaaaaaatgattcaagcg 
ttaaagaagtatcaaactcatacaaatgcaggtatgtttggagcacttcaagatgctgct 
acgtatgcactcaatcattatgatgagtttttagaaaagcaaaatgaaatatttagacgt 

40 agacgtgataattttgaatcacaactaaaacatgcacatttaccgtttgttcactctaag 
ggaggtatttacatttggttacatacacccccgggttatgatagtgaagcattcgaacag 
ttgttattaaaagaaaagtcaattttagttgcacctggtaaaccatttggtgaaaatggt 
aatcaatatgtgagggtttcattggcgctcgatgataaacaattagaagaagcggcgaat 
cgcttaacacaattacggtatttgtatgaaagataa 

45 

Sequence 2348 

MRRHSMEMSERLASIPDSYFGKTMGRIVEHGPLPLINMAVGIPDGETPKGIINHFSEALC 
IPENQKYGPFHGKDAFKQAIVNFYQRHYDVELDKEDEVCILYGTKNGLVALPTCVVNPGE 
IVLLPDPGYTDYLAGVMLADAKPLPLKLSPPNYLPNWNTISAKVLEKTKLIYLTYPNNPT 
50 GSTATQDDFDEAIHRFKGTQTKIVHDFAYSAFGFDAKNPSILASKNAKDVAIEIFSLSKG 
YNMSGFRVGFAVGNKKMIQALKKYQTHTNAGMFGALQDAATYALNHYDEFLEKQNEIFRR 
RRDNFESQLKHAHLPFVHSKGGIYIWLHTPPGYDSEAFEQLLLKEKSILVAPGKPFGENG 
NQYVRVSLALDDKQLEEAANRLTQLRYLYER* 

55 Sequence 234 9 

Contig_074 4_pos_4 718_5716, 

is similar to (with p-value 7.0e-84) 

>gp:gp|U31175|SAU31175_l Staphylococcus aureus D-specific 
2-hydroxyacid dehydrogenase (ddh) gene, complete cds. NID: g 



577 



wo 01/34809 



PCT/USOO/30782 



1644432. 

atgactaaaattaaattaatgggtgtcagagaagaagatgaacattatattgaaatgtgg 
tcacaacaacatgaagtggaagtggatatgtcgaaagaacagttaactgaagacaatgtc 
caatctattgaaggatttgatggactatcattgtctcaaacattaccattatcagaaaca 
atttataataaattaaatcaacttggaattcggcagatcgctcaacgaagtgctggattt 
gatggttataatttagagttagcatctaaatatggtcttattatatctaatgtgccttcc 
tattcacctcgaagcattgctgagtttaccgtgactcaagccatcaatattgtacgtcac 
tttar:tcatattcaaagaaaaatgagattgcacgattttaggtgggaagcatcaatttta 
tctcaatcaatcaaagatttaaaggtagcggttattggcacgggacatattggtggcatt 
gttgcacaaatattctcagaaggatatctatgtgacgttgtagcgtatgatccttttcca 
agtgaacatgtgaaaccttacgttacctataaacaaagtataaatgaggcaattaaagag 
gcagatattgtcacaatacatatgccgtcaacacaatataacaattacctgtttaatgaa 
aacatgtttcaaatgtttaaaaagggtgctgtgtttgtaaattgtgctagaggatcctta 
gtagataccaaggctttgttatctgcaatagagcaaggtcaaattaaaggtgcagcactt 
gatacttatgaatatgaaattggagtatatacgacagatagaagtgaagaaggtttgaat 
gacccacttttagaggaattaattactagagaagatattattgttacaccgcatatagca 
ttttatactgaagaggcaatcaaacatcttatttttgatgctttagatgcaacaatggaa 
gtattaaatactggcacgacggagttaagagtaaattaa 

Sequence 2350 

MTKIKLMGVREEDEHYIEMWSQQHEVEVDMSKEQLTEDNVQSIEGFDGLSLSQTLPLSET 
lYNKLNQLGIRQIAQRSAGFDGYNLELASKYGLIISNVPSYSPRSIAEFTVTQAINIVRH 
FNHIQRKMRLHDFRWEASILSQSIKDLKVAVIGTGHIGGIVAQIFSEGYLCDVVAYDPFP 
SEHVKPYVTYKQSINEAIKEADIVTIHMPSTQYNNYLFNENMFQMFKKGAVFVNCARGSL 
VDTKALLSAIEQGQIKGAALDTYEYEIGVYTTDRSEEGLNDPLLEELITREDIIVTPHIA 
FYTEEAIKHLIFDALDATMEVLNTGTTELRVN* 

Sequence 2351 

Cont i g_07 4 4 _pos_8 4 5 6_6 1 0 8 , 
30 is similar to (with p-value O.Oe+00) 

>sp:sp|P32113|ATKA_ENTFA POTASSIUM /COP PER- TRANS PORTING ATPAS 
E A (EC 3.6.1.36). >pir : pir | A45995 I A45995 Cu2+-transporting 
ATPase (EC 3.6.1.-) - Enterococcus hirae >gp : gp I L13292 | ENECO 
PPUMP_1 Enterococcus hirae ATPase (copA) gene, complete cds; 
35 ATPase (copB) gene, complete cds. NID: g290641. 

atgacatgtgctgcgtgctcaaatcgtattgaaaagaaattgaatcgtatgaatcatgtt 
caagctaaagtgaatctgactactgaaaaagcaactatcgactatgagtctgacgattat 
catctcgaagattttgtagagcaaattcaaagtctcggctatgatgttgcagttgagcaa 
gtagaattaaatataaatggtatgacatgtgctgcatgttctaatcgtatagaaaaggtt 
40 ctaaatcaaacgcaaggtgtacaacaggcaacagtaaacttaactaccgaacaagcactc 
atcaaatattaccctagtgctacgaacacggaagcattaattaagcgtattcaaaatatt 
ggatacgatgctgaaactaaaacttcatcaaaagcgcaatcaaatcgtaaaaaacaagag 
ttaaaacataaacgcaataaattaatcatttcagctattttatcgttgccactattatta 
gtaatggtggtgcatatctcacctatttccattccatccattttggtcaatccttgggta 
45 caattaattctttcaacacctgtccaatttattattgggtggcaattttacgttggcgcg 
tataaaaatttgcgaaatggttcagctaacatggatgtattggttgctgttggtaccagt 
gccg-.:atatttttatagcatttatgaaatgatgatgtggctcacacatcaaacacatcac 
ccgcatttatattttgaaacaagtgctattttaattacgttaattcttcttggtaaatat 
ttagaagcacgtgccaaatcacagactaccaatgcattaagcgaattgttaaatttacaa 
50 gcgaaagaagcacgagtaattaaagaaaataaagaaattatgcttccacttgataaagtt 
aaagtcggagatactttactaataaaacccggcgaaaagatacctgtagatggcaaagtc 
actaaaggtgatacttctattgacgaatccatgctaactggtgagtctatacctgttgaa 
aaaagtagtggcgattcagtgattggttctaccatgaataaaaatggttcaatcatgatt 
gaagcaactcaagtaggtggtgatactgcattatcacatataattaaagtggttgaggat 
55 gctcaaagttctaaagcaccgattcaacgcttagctgatattatttctggatattttgtt 
ccgattgtagttagcattgcggttattacttttatcatatggattatattcgttcacccc 
gggcaatttgaacctgcacttgtttcagcaatatctgttttagttattgcttgtccctgt 
gcacttggtttagcaacgcctacatctattatggtaggtacaggacgtgctgcagaaaat 
ggcatattattcaaaggaggccaatttgtagaacgtgcacattatgttgatacaatcgtg 
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ctagataaaacaggcacaattactaatggtcaacctgtagtaactgattatgttggtgac 
aatgatacattacaacttttagcaagtgctgaaaatgcttcagaacatcctcttgctgat 
gctattgttacttatgctaaagataaaggtcttaatttacttgataatgacacttttaaa 
tcaattccgggacatggtattaaagctacgattcatcaacaacaaatccttgtgggcaat 
5 cgaaaattaatgaacgattacaatatatctattagtaataaattaaatgaccaattaaat 
cactatgaacatttaggtcaaacggcaatgatgattgccgtggataatcaaattaatgga 
atcattgctgttgctgatacagtaaaaaatgatgctaaacaagcgataaaagaactaaga 
aatatgaatatcgacgtggttatgctgactggtgataacaatcgaacagctcaaaccatc 
gccaaacaagttggcattgaacatgtaattgcagaagtgttgcccgaagaaaaggcacat 

10 caaatctctttattacaagacaaaggtaaacaggttgccatggtcggtgatggaattaat 
gatgcgcctgcacttgtaaaagccgatattggaatggctataggcactggagctgaggta 
gcgattgaagctgcagatattacgattcttggtggtgacttgctattagttccaaaagct 
atcaaagcaagtaaagctacgattaaaaatattcgacaaaatttattttgggcatttgga 
tataacgtagctggcatcccaatagctgcttgtggtttattagcaccttggattgccggt 

15 gctgctatggcattaagttctgttagcgtagttatgaatgcattaagactgaaaaaaatg 
aaactatag 

Sequence 2352 

MTCAACrjMRIEKKLNRMNHVQAKVNLTTEKATIDYESDDYHLEDFVEQIQSLGYDVAVEQ 
20 VELNINGMTCAACSNRIEKVLNQTQGVQQATVNLTTEQALIKYYPSATNTEALIKRIQNI 
GYDAETKTSSKAQSNRKKQELKHKRNKLIISAILSLPLLLVMVVHISPISIPSILVNPWV 
QLILSTPVQFIIGWQFYVGAYKNLRNGSANMDVLVAVGTSAAYFYSIYEMMMWLTHQTHH 
PHLYFETSAILITLILLGKYLEARAKSQTTNALSELLNLQAKEARVIKENKEIMLPLDKV 
KVGDTLLIKPGEKIPVDGKVTKGDTSIDESMLTGESIPVEKSSGDSVIGSTMNKMGSIMI 
25 EATQVGGDTALSHIIKVVEDAQSSKAPIQRLADIISGYFVPIVVSIAVITFIIWIIFVHP 
GQFEPALVSATSVLVIACPCALGLATPTSIMVGTGRAAENGILFKGGQFVERAHYVDTIV 
LDKTGTITNGQPWTDYVGDNDTLQLLASAENASEHPLADAIVTYAKDKGLNLLDNDTFK 
S I PGHG I KAT I HQQQI L VGNRKLMN DYN I S I SNKLNDQLNH YEH LGQT AMMI AVDNQI NG 
IIAVADTVKNDAKQAIKELRNMNIDVVMLTGDNNRTAQTIAKQVGIEHVIAEVLPEEKAH 
30 QISLLQDKGKQVAMVGDGINDAPALVKADIGMAIGTGAEVAIEAADITILGGDLLLVPKA 
IKASKATIKNIRQNLFWAFGyNVAGIPIAACGLLAPWIAGAAMALSSVSVVMNALRLKKM 
KL* 

Sequence 2353 
35 Con t igO 7 4 5_pos_3 4 1_7 4 2 , 

is similar to (with p-value 5.0e-21) 

>sp:sp| P35154 I YPUG_BACSU HYPOTHETICAL 29.6 KD PROTEIN IN RIB 
T-PACB INTERGENIC REGION (0RFX7). >pir : pir | S4 554 9 I S4 554 ? hyp 
othet.ioal protein X7 - Bacillus subtilis >gp: gp | L09228 I ;-ACDI 

40 A_16 Bacillus subtilis spoVA to serA region. NID: g410114. > 
gp:gp|Z99116|BSUB0013_34 Bacillus subtilis complete genome [ 
section 13 of 21): from 2395261 to 2613730. NID: g2634723. 
atgtattcatgctataataatttagtgaaatatgcattaacgatgcaatttgatatagag 
gtagatgtaatgtatgaagtaaaactcgatgcatttaatggtccattagacttattattg 

45 catctaattcaaaaatatgaaattgatatttatgatatccctatgaaagccttaactgaa 
cagtacatgcaatatgttcatgcgatgaatcagctagaaattaatgttgctagtgaatat 
ttagttatggcatcagaattactaatgattaaaagtaaattcaacgatgaatataaccaa 
aaatccaacactaaaaacatgctgtgcggcttctgcactatgaactgttgtttcaccatt 
aactgttgtagtaaaattagcatccatcgctccagtaattga 

50 

Sequence 2354 

MYSCYNNLVKYALTMQFDIEVDVMYEVKLDAFNGPLDLLLHLIQKYEIDIYDIPMKALTE 
QYMQYVHAMNQLEINVASEYLVMASELLMIKSKFNDEYNQKSNTKNMLCGFCTMNCCFTI 
NCCSKISIHRSSN* 

55 

Sequence 2355 
C o n t i g_0 7 4 5_po s_l 7 1 7_2 3 1 6 , 
, pu'.ativs? peptide of unknown function 

gtgacttr.tatgtctacgaataatgagattgaatttaaacaaatactagatcaagatact 



579 



wo 01/34809 



PCT/USOO/30782 



tactcaaaaatctatgaacactatttcaaaaatcaatcaccttttaagcaaactaatttc 
tatatcgacacagagaattttaaattaaaacagcatcatgctgctttgcgtataagggta 
aaagattatatgtttgaaatgactttaaaagtccctgctgaagttggattgacagaatat 
aatcactcagtaaatatagaacctgaacttgatatgtcacttcaactttctcaattaccc 
5 aacgatattagaaatattttagaacaggactttaatattttagaaaatgagcttaaagta 
ctaggaaacttaactacctatcgtttagaaaccgattatcaaaatgaattactagtatta 
gataagagtgaatatctcggcaaaactgattatgaattagagtttgaagt tcattcttat 
gatgaaggatattcaaaatttaaaactttacttcaacattttaatcttcaacatcaaaaa 
cccttgaataaagtgcaacgtttttttcaagaaaaacaaaatgcaagtgataaagagtaa 

10 

Sequence 2356 

VTFMSTNNEIEFKQILDQDTYSKIYEHYFKNQSPFKQTNFYIDTENFKLKQHHAALRIRV 
KDYMFEMTLKVPAEVGLTEYNHSVNIEPELDMSLQLSQLPNDIRNILEQDFNILENELKV 
15 LGNLTTYRLETDYQNELLVLDKSEYLGKTDYELEFEVHSYDEGYSKFKTLLQHFNLQHQK 
PLNKVQRFFQEKQNASDECE* 

Sequence 2357 

Contig_074 5_pos_2380_2763, 

20 putative peptide of unknown function 

atggaacacggtgatattatgtctaaaacaccatatgagttgattggtcaaaaagccttg 
tatcaaatgattgatcatttctatcaacttgtcgagaaagattctcgtatcaatcattta 
ttcccaggcgatttcaaggaaaccagtcgaaagcagaagcaatttttgacacagtttctt 
ggaggtcctgacttatatacccaagaacatggtcatcccatgttaaaacgaagacatatg 

25 gaatttacaattagcgagtatgaacgtgatgcatggcttgagaacatgcatactgctatt 
caacacgccaaacttcctgcgggtgtaggcgattacttgtttgagcgattaagacttact 
gcaaatcacatggtaaattcctaa 

Sequence 2358 

30 ME1:3!jIMoKTPYELIGQKALYQMIDHFYQLVEKDSRINHLFPGDFKETSRKQKQFLTQFL 
GGPDLYTQEHGHPMLKRRHMEFTISEYERDAWLENMHTAIQHAKLPAGVGDYLFEKLRLT 
ANHMVNS* 

Sequence 2359 
35 Con t i g_0 7 4 5_po s_2 8 0 7_3 5 8 3 , 

putative peptide of unknown function 

atggagaataagagtcgtgaagatactaatctatcacctgttagcaaaatagaaatctat 
tctttttttgatccttttagcaaagattgttttaaattatctgcaatcttatcaaaatta 
agaattgaatataataaatatataaaggtaagacatattttaaacccttctttaaaggta 

40 ttaactaagtgtcaagctcaaagtacttcagattttgacaatattgcacttgcctataaa 
gccgctgaacttcaaggtcgtatcagagcagaaagatttatacatttaatgcaaaatgaa 
atcattccaaaacgtgatattattaccgaagatatgatttctgattgtattaataatgcc 
ggcattgactatcaagtttttaaagaagacttgcaaaaggacaagttgactgacagcttg 
aaagttgatcttcacattgcaagagaaatggaaatagaacaagctccctcacttgttttt 

45 ttcagcgaaaatgttcatgaagaaggtttaaaagtcgaaggattatatccttatcatatt 
tatacttacattattaatgagttaatgggacaacctatagagaaaaatcttcctccaaaa 
ttagaatactacattcaaaagaaacaactagtaacaatggaagaacttttaacgatctat 
gaatggcctgaaaaattgctaaataaagaattaaagaaactcacacttcaacaaaaagtt 
gaaaagttgcaatatccagagggagaattttggaaatctaaaatgcctcagtgtta5. 

50 

Sequence 2360 

MENKSREDTNLSPVSKIEIYSFFDPFSKDCFKLSAILSKLRIEYNKYIKVRHILNPSLECV 

ltkcqaqstsdfdnialaykaaelqgriraerfihlmqneiipkrdiitedmisdcinna 
gidyqvfkedlqkdkltdslkvdlhiaremeieqapslvffsenvheeglkveglypyhi 
55 ytyiinelmgqpieknlppkleyyiqkkqlvtmeelltiyewpekllnkelkkltlqqkv 

eklqypegefwkskmpqc* 

Sequence 2361 

Contig_074 5_pos_5858_4050, 
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is similar to (with p-value O.Oe+00) 

>gp:crp: 088209 1 D88209_l Bacillus licheniformis- DNA for Pz-pep 
tidase, complete cds. NID: gl651215. 

atgagtcaacaattaacaagagaagaacaggaacgtaaatatcctgaatatacatgggat 
5 ttaacaactatttttaaaagtgatgaagcatt tgaagaagcttttaaaagtattgaagct 
aaaataggtgaagaagaaaaatttaaaggtcatcttggtgaaagtgctgaaacattatat 
gaagcgctaagtcttgaagacgagttaggtacaaaattagaaaaggtatatgtatacgca 
catttaaaacaagatcaagatactgcaaatgataaatataccggtttagaagcgcgtgca 
catcaacttgttattaaatatagctctgcatggagttttttagtacctgaaattttacaa 

10 ctagatgaagctactattcaatcttttatcgattctaatgatgatttaaaacgatatgaa 
t toga tttgaaattgattaatgagaaacgtccacata tat tagatgcgaatacagaaaag 
ttattaacagaagcacaagacgcactttcaacgccttctaatgtatatggaatgttcagc 
aatgcagatttagaatttgaagatgctatagataaagatggtcaagcttatcctttaaca 
caaggtacatttatcaagtatttagagtctgatgatcgtgagttaagagcttctgctttt 

15 agaaatgtttataaagcatacggtgcgcataataacacgctaggtgctactttagctggt 
gaggttaagaaaaatgtatttaatgctagaactcatcattatcgttcagcacgtgaaaga 
gctttaagtaataatcatattccagaagctgtttacgataacttaatcaaaacggtccat 
aaatacttacctttattacacagatacacgaagcttagacaagagttactaggtttagac 
gatttaaaaatgtatgatctttatacacctcttgttaaagatgtcaaatttgaaatgcca 

20 tatgaagaggcaaaatcctggatgttaaaagcacttgagccaatgggagaagaatactta 
aacgtggttaaggaaggtctagataaccgttgggtcgatgtatatgaaaataaaggtaaa 
cgttcaggcggatattcatccggtggacatttaactaatcctttcattttacttaactgg 
tcagacactgtttctgatttatatactttagtacatgaatttggtcactctgcacatagt 
tactttagtagacagaatcaaccatcaaatttaagcgattatacaatctttgtcgctgag 

25 gtagcatcaacttgtaatgaggctttacttagtgactacatggacaaacatttagatgat 
gaacgacgtctattgttacttaaccaagaattagaacgatttagagcaacactattccgt 
caaacaatgtttgctgaatttgaacataaaatacatcaaatagaagaagctggggagccg 
ttaacgccaaatcgtatgaatgaagaatatgctaaactgaacaaactatattttggtgaa 
gcagtagaaactgacgatgatattagtaaagaatggtcacgtattcctcatttctatatg 

30 aattattatgtatatcaatacgcaactggttatagtgcagctcaaagtttaagtcatcaa 
attttaactgagggtcaacctgctgttgaacgatatatcaatgaattcttaaaaaagggt 
agctcaaactatccgattgaaattttaaaaaatgcaggtgttgacatgacaacacctcaa 
ccaatagaggaagcttgtgaagtattcgaacaaaaattagatgcttttgaaaagttaatg 
aaagcttag 

35 

Sequence 2362 

MSQQLTREEQERKYPEYTWDLTTIFKSDEAFEEAFKSIEAKIGEEEKFKGHLGESAETLY 
EALSLEDELGTKLEKVYVYAHLKQDQDTANDKYTGLEARAHQLVIKYSSAWSFLVPEILQ 
LDEATIQSFIDSNDDLKRYEFDLKLINEKRPHILDANTEKLLTEAQDALSTPSNVYGMFS 

40 NADLEFEDAIDKDGQAYPLTQGTFIKYLESDDRELRASAFRNVYKAYGAHNNTLGATLAG 
EVK'OCVFMARTHHYRSARERALSNNHIPEAVYDNLIKTVHKYLPLLHRYTKLRQEI.LGLD 
DLKM^DLYTPLVKDVKFEMPYEEAKSWMLKALEPMGEEYLNVVKEGLDNRWVDVYoNKGK 
RSGGYSSGGHLTNPFILLNWSDTVSDLYTLVHEFGHSAHSYFSRQNQPSNLSDYTIFVAE 
VASTCNEALLSDYMDKHLDDERRLLLLNQELERFRATLFRQTMFAEFEHKIHQIEEAGEP 

45 LTPNRMNEEYAECLNKLYFGEAVETDDDISKEWSRIPHFYMNYYVYQYATGYSAAQSLSHQ 
ILTEGQPAVERYINEFLKKGSSNYPIEILKNAGVDMTTPQPIEEACEVFEQKLDAFEKLM 
KA* 

Sequence 2363 
50 Contig_074 5_pos_14 54^8 4 6, 

is similar to (with p-value 2.0e-77) 

>sp:sp|O06446|SECA_STAAU PREPROTEIN TRANSLOCASE SECA SUBUNIT 
. >gp:gp|U97062|SAU97062_l Staphylococcus aureus NCTC 8325 S 
ecA (secA) gene, complete cds. NID: g2078389. 
55 atgggtggtattgctatacataaaggtgatattgcagaaatgagaacaggtgaagggaaa 
acattgactgcaaccatgccgacgtatttgaatgctttagctggtagaggtgtacatgtt 
attacagtcaatgaatatctatcaagttcacaaagtgaagaaatggctgaactatataac 
tatcttggcttaactgtaggtttgaacttaaatagtaagtcaactgaagaaaaacgtgag 
gcttacgcacaagatatcacttatagtacgaataatgaacttgggtttgattatcttaga 
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gataatatggtgaactatgctgaagagagagtaatgcgtcctctacattttgcaattatt 
gatgaggtcgattccatattgatcgacgaagcaagaacacctttaattatttctggtgaa 
gcggaaaaatctacttctttatatggaaaattgattattattagttataaattattcaaa 

aataaagagcaactattttatgttgcatatgaaataaagggaaagaaagtctactataaa 
5 cgagatataaatcctaaagattatattaaagaacataaacctgattgtaaqgatatacgt 
agaaaatga 

Sequence 2364 

MGGIAIHKGDIAEMRTGEGKTLTATMPTYLNALAGRGVHVITVNEYLSSSQSEEMAELYN 
10 YLGLTVGLNLNSKSTEEKREAYAQDITySTNNELGFDYLRDNMVNYAEERVMRPLHFAII 
DEVDSILIDEARTPLIISGEAEKSTSLYGKLHISYKLFKNKEQLFYVAYEIKGKKVYYK 
RDINPKDYIKEHKPDCKDIRRK* 

Sequence 2365 
15 Contig_0745_pos_0_300, 

putative peptide of unknown function 

atggattattattatcaatttcataagcatcaacattacttcctatgtcatgatatttta 
gaagatgcttggaaatcacaaaatactttttcaaaacatgatgcagtagtaagtctcatt 
ttgtttgcaacagctaattatcactatcgtcgtgggaatttaaaaggcgcattaaaatct 
20 tttgaaaaagctttattaactattaaccacgctaaagatgacactcaattacaccttaat 
atcaatgaatttaaacaacttattgtgaaaatgattgaagcagtaaaattacaaaaaACT 



Sequence 2366 

25 MDYYYQFHKHQHYFLCHDILEDAWKSQNTFSKHDAVVSLILFATANYHYRRGNLKGALKS 
FEKALLTINHAKDDTQLHLNINEFKQLIVKMIEAVKLQKT 

Sequence 2367 
Contig_0746_pos_541_1500, 

30 is similar to (with p-value O.Oe+00) 

>sp:sp|P05425|ATKB_ENTFA POTASSIUM/COPPER-TRANSPORTING ATPAS 
E B (EC 3.6.1.36). >pir : pir I B45995 I B45995 Cu2+-transporti.ng 
ATFase {EC 3.6.1.-) - Enterococcus hirae >gp : gp I L13292 lir.NECO 
PPUMP_2 Enterococcus hirae ATPase (copA) gene, complete cds; 

35 ATPase (copB) gene, complete cds. NID: g290641. 

atgcatcat gat aaccatgcctcaca teat cat agtggccatgcacat cat catggaaat 
tttaaagttaagttttttgtttcattaatttttgcaatacctatcattctcttatcgcca 
atgatgggtgttaacttaccttttcaattcacatttccaggttctgaatgggtagtgtta 
atattaagtacaattttattcttttatggtggtaaaccgttcttgtctggtggtaaagat 

40 gaaattgctacaaaaaaaccaggcatgatgaccttagttgctctaggtatttcagtagct 
tatatttatagcttgtatgctttttatatgaataactttagtagtgcaactggtcataca 
atggactttttttgggaattagcaaccttaattttaattatgctattaggacattggata 
gaaatgaatgctgtcggaaatgctggagatgctttaaagaaaatggcagaactattacct 
aatagtgctattaaagttatggataatggccaacgcgaagaagttaaaatatcagacatc 

45 atgactgatgatatcgtcgaagtaaaagccggagaaagcattccaacagatggtattatc 
gttcaaggacaaacatctatagatgaatccctagtcactggagaatctaaaaaagtacaa 
aaaaatcaaaatgacaacgtcatcgggggttctattaatgggtctggaacaatacaagtc 
aaggttacagctgttggagaagatggatatctttctcaagttatgggacttgttaatcaa 
gcacaaaatgataaatctagtgctgaattgttatctgataaagtagcgggttatttattc 

50 tactttgctgtaagtgttggcgtgatttcttttattgtctggatgctcattcaaaatgat 
gttgattttgcattagaacgtcttgtaactgtgttagtcattgcttgtcccatgctttag 



Sequence 2368 

55 MHHDNHASHHHSGHAHHHGNFKVKFFVSLIFAIPIILLSPMMGVNLPFQFTFPGSEWVVL 
ILSTILFFYGGKPFLSGGKDEIATKKPGMMTLVALGISVAYIYSLYAFYMNNFSSATGHT 
MDFFWELATLILIMLLGHWIEMNAVGNAGDALKK^4AELLPNSAIKVMD1S^GQREEVKISDI 
MTDDIVEVKAGESIPTDGIIVQGQTSIDESLVTGESKKVQKNQNDNVIGGSINGSGTIQV 
KVTAVGEDGYLSQVMGLVNQAQNDKSSAELLSDKVAGYLFYFAVSVGVISFIVWMLIQND 
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VDFALERLVTVLVIACPML* 

Sequence 2369 
Con t ig_0 7 4 6_pos_l 6 1 4 __2 5 4 3 , 
5 is similar to (with p-value 2.0e-78) 

>sp:sp| P05425 I ATKB_ENTFA POTASSIUM/COPPER-TRANSPORTING ATPAS 
E B (EC 3.6.1.36). >pir :pir I B45995 I B45995 Cu2+-transporting 
ATPase (EC 3.6.1,-) - Enterococcus hirae >gp: gpl L13292 I ENECO 
PPUMP_2 Enterococcus hirae ATPase (copA) gene, complete cds; 

10 ATPase (copB) gene, complete cds. NID: g290641. 

atggataaaactggtactttaactgagggtaacttttctgtgaatcattatgagagcttt 
aaaaatgatttgagtaatgatacaatattaagccttttcgcctcattagaaagtcaatct 
aatcacccattagctataagtattgttgattttgcgaaaagtaaaaatgtttcatttact 
aatccacaagacgttaataatattccaggtgtcggattagaaggtctaattgataataaa 

15 acatataaaataacaaatgtctcttatcttgataaacataaacttaattatgacgatgac 
ttgtttactaaattagctcaacaaggtaattcaatcagttatttaattgaggatcaacaa 
gtcattggcatgattgctcaaggagatcaaattaaagaaagctcaaaacaaatgatagct 
gatttactatcaagaaatattacaccagtcatgcttacaggtgacaataatgaagtggca 
cacgctgtcgcaaaagaattaggtattagtgatgttcacgcacaactcatgccagaagat 

20 aaggaaagcattataaaagattatcaaagtgacggtaataaagtcatgatggtcggagac 
ggtatcaacgatgcgccgagtcttataagagccgatattggtatagcaattggtgcaggc 
acagatgttgcagtggattcaggtgatatcatacttgttaaaagtaatccatcagatatc 
attcatttcttgactctttcaaataatactatgagaaaaatggtgcaaaacttatggtgg 
ggtgcaggttataatattgttgctgtacctttagcagctggcgcattagcttttatcggg 

25 ttaaLattatcaccagctgtaggagcaatattaatgtctttaagtacagttatagr.agcg 
attaatgcttttacattaaaattaaaataa 

Sequence 2370 

MDKTGTLTEGNFSVNHYESFKNDLSNDTILSLFASLESQSNHPLAISIVDFAKSKNVSFT 
30 NPQDVNNIPGVGLEGLIDNKTYKITNVSYLDKHKLNYDDDLFTKLAQQGNSISYLIEDQQ 
VIGMIAQGDQIKESSKQMIADLLSRNITPVMLTGDNNEVAHAVAKELGISDVHAQLMPED 
KESIIKDYQSDGNKVMMVGDGINDAPSLIRADIGIAIGAGTDVAVDSGDIILVKSNPSDI 
IHFLTLSNNTMRKMVQNLWWGAGYNIVAVPLAAGALAFIGLILSPAVGAILMSLSTVIVA 
INAFTLKLK* 

35 

Sequence 2371 

Contig__07 4 6_pos_4 0 1 1_4 4 93 , 

putative peptide of un)cnown function 

atgatgaaaaaagataaagacactaatgaccaaaaaagtgagagccatatgaagcataat 
40 gatgaaagtaaagttcctgaagatatgacatcgactaatgagggtgaatttaaagtagga 
gataaagtaacgattacagcaggacatatgccaggtatgaaaggtgcagaagctactgta 
aaaggtgcgtataaaacatatgcctatgttgtaagttataaacccacaaatggaaatgaa 
aaagtaaacaatcataaatggatcgtaaacgaagaaatcaaagatgcacctaaagatgga 
ttliagtaaaggcgatactgttaaattagaagcaagtcatatgtctggta-tgaaaggtgct 
45 acagccaatatagataacgtgaaaaagacgactgtttacgtagttgattacaaatccaaa 
gataatggtaaaatcattaaaaatcataaatggatgacaggaaatgagctgaaagcacga 
taa 

Sequence 2372 

50 MMKKDKDTNDQKSESHMKHNDESKVPEDMTSTNEGEFKVGDKVTITAGHMPGMKGAEATV 
KGAYKTYAYVVSYKPTNGNEKVNNHKWIVNEEIKDAPKDGFSKGDTVKliEASHMSGMKGA 
TANIDNVKKTTVYVVDYKSKDNGKIIKNHKWMTGNELKAR* 

Sequence 2373 
55 Con t ig_0 7 4 6_po s_5 1 0 9_4 7 0 5 , 

is similar to (with p-value 8.0e-65) 

>sp:sp| P30330i ARSC_STAAU ARSENATE REDUCTASE (ARSENICAL PUMP 
MODIFIER). >pir:pir|D41903|D41903 arsenate reductase (EC l.~ 
.-.-) - Staphylococcus aureus plasmid pI258 >gp:gp|M86824 | PI 
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2ARSRBC_4 Plasmid pI258 arsenic resistance operon (arsRBC) g 
enes, complete cds. NID: gl50725. 

gtgtcaataatgacaaagaaaataatttactttatatgtacaggtaattcatgccgtagc 
caaatggctgaaggctggggcaaaaacatcttaggtgatgaatggcaagtatattctggc 
5 ggtattgaggcacacggtgtcaatccaaaggcaattgaagcaatgaaagaagttggaatt 
ga tat ctcaaacca tact tctaatttaattgacaacactat act aaaccaatccgattta 
gtagtaactctatgtagtgatgctgacaataattgtcctatattaccgccgaatgtaaaa 
aaagaacattggggatttgatgatccagcaggcaaaccttggtcagaattccaacgtgtt 
agagatgaaatcaaaaccgccattgaatcatttaaaactagataa 

10 

Sequence 2374 

VSIMTKKIIYFICTGNSCRSQMAEGWGKNILGDEWQVYSGGIEAHGVNPKAIEAMKEVGI 
DISNHTSNLIDNTILNQSDLWTLCSDADNNCPILPPNVKKEHWGFDDPAGKPWSEFQRV 
RDEIKTAIESFKTR* 

15 

Sequr^nrjrj 2375 

Contig_074 6_pos_4270_3908, 

putative peptide of unknown function 

atgattgtttactttttcatttccatttgtgggtttataacttacaacataggcatatgt 
20 tttatacgcaccttttacagtagcttctgcacctttcatacctggcatatgtcctgctgt 
aatcgttactttatctcctactttaaattcaccctcattagtcgatgtcatatcttcagg 
aactttactttcatcattatgcttcatatggctctcacttttttggtcattagtgtcttt 
atcttttttcatcatattaaaatctccctattgattagtttgttacttttatttgaccca 
tcattccattatcttcatgctcaagtatatgacagtgaaacatgtatgttccagtatttt 
25 taa 

Sequence 2376 

MIVYFFISICGFITYNIGICFIRTFYSSFCTFHTWHMSCCNRYFISYFKFTLISRCHIFR 
NFTFIIMLHMALTFLVISVFIFFHHIKISLLISLLLLFDPSFHYLHAQVYDSETCMFQYF 
30 * 

Sequence 2377 

Contig_074 7_pos_4050_4487, 

is similar to {with p-value 8.0e-27) 

35 >sp:sp|P42435 jNASD_BACSU NITRITE REDUCTASE (NAD(P)H) (EC 1.6 
.6.4), :>gp:gp j D3068 9 |BACNARB_4 Bacillus subtilis DNA around 
narB region (nasB operon and nasA gene). KID: g710016. >gp:g 
pi 2;99105|BSUB0002_159 Bacillus subtilis complete genome (sec 
tion 2 of 21): from 194651 to 415810. NID: g2632457. >gp:gpl 

40 D50453|D50453_33 Bacillus subtilis DNA for 25-36 degree regi 
on containing the amyE-srfA region, complete cds. NID: gl805 
369. 

atgttatggaggtttgttatgggaagtttttttaatcggatgactcgaaaagagaatcct 
actgtagagtctggtgttaaagattttggcgtcatatctgttgaaaatggctaccaaata 

45 tttatcggaggtaatggtggtactgatgttactgtaggtaaattgttaacgacagttgaa 
accgaagatgaagtgattcaattatgtggtgccctcatgcagtattacagagaaacaggt 
gtttacgctgaaagaacagcaccatggttagaacgtatgggctttgaaaatgtcaagaat 
gtcttattaaatcaagaaaagcaaaaagaactgtatttaagaattatggaagccaaaaaa 
gctgttgagaatgaaccatgggaaactattgttgaaaataaagaagcacaaaaaatcttt 

50 gaagttgagaaggtgtaa 

Sequence 2378 

MLWRFVMGSFFNRMTRKENPTVESGVKDFGVISVENGYQIFIGGNGGTDVTVGKLLTTVE 
TEDEVIQLCGALMQYYRETGVYAERTAPWLERMGFENVKNVLLNQEKQKELYLRIMEAKK 
55 AVENEPWF.TIVENKEAQKIFEVEKV* 

Sequence 2379 

Contig_074 7_pos_4 4 90_4 804 , 

is similar to (with p-value 3.0e-22) 
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>sp:sp|P42436|NASE_BACSU ASSIMILATORY NITRITE REDUCTASE (NAD 
(P)H) SMALL SUBUNIT (EC 1.6.6.4). >gp : gp | D30689 | BACNARB_5 Ba 
cillus subtilis DNA around narB region (nasB operon and nasA 
gene). NID: g710016. >gp : gp I Z99105 I BSUB0002_158 Bacillus su 
5 btilis complete genome (section 2 of 21): from 194651 to 415 
810. NID: g2632457. >gp: gp | D50453 | D50453_32 Bacillus subtili 
s DNA for 25-36 degree region containing the amyE-srfA regio 
n, complete cds . NID: gl805369. 

atgaaagctaaagaaaagattaaagttacaacaatgaatgaaatgattcctcaaataggc 
10 aaaaaagtagttgtaaacgaaaaagaaataggtatttttctcacagataatggtgattta 
tatgccattggaaatatatgtccacataaagaaggaccgttgtctgaagggactgtaagt 
ggtgattatgtttactgtccgttacacgatcaaaaaatagctttaaaaactggagaagta 
caacaacctgatacaggatgtgtagagacatacgaagtagaagttattgatggagatatt 
tacttatgtctataa 

15 

Sequence 2380 

MKAKEKIKVTTMNEMIPQIGKKVVVNEKEIGIFLTDNGDLYAIGNICPHKEGPLSEGTVS 
GDYVYCPLHDQKIALKTGEVQQPDTGCVETYEVEVIDGDIYLCL* 

20 Sequence 2381 

Cont i g_0 7 4 7_pos_3 2 8 5_2 647, 

is similar to (with p-value 8.0e-40) 

>sp:sp!P39592 1YWBI_BACSU HYPOTHETICAL TRANSCRIPTIONAL REGULA 
TOR IN EPR-GALK INTERGENIC REGION. >pir : pir I S3967 9 | S39679 hy 

25 pothetical protein - Bacillus subtilis >gp : gp 1 X73124 I BSGENR_ 
25 B. subtilis genomic region (325 to 333). NID: g413923. >gp 
:gp| /.99:.23|BSUB0020_126 Bacillus subtilis complete genore (s 
ection 20 of 21): from 3798401 to 4010550. NID: g2636240, 
atgaatgacattgtgaacgttcaaaaaggtcatattaaaataggcttatcaccaatgatg 

30 aatgttcaaatgtttacaaatgcattgaatcagtttcacagactctatcctaatgtgaca 
tatgaagtgattgagggtggtggtaaaattgttgagaacttaacatctaatgatgatgtg 
gatattggtattactacattacctgtagatcacactgaatttcattcaacttctttatat 
aatgaagaattattattagtagtaagtaatgaccatcatttagcacatttaaataaagta 
gacatggcagatttgaaagatgaagagtttgttttatttcatgatgattattatttaaaa 

35 gatcaaattatagagaactgtaaaaggctaggctattaccctaaaactgttgctaatatt 
tctcaaattagttttatcgctaatatgattcaacaaggaataggaattagtatcgttcca 
gaaagtttagttaatttaatggggaataacgtaacgtccattcaattagagaatgttgaa 
ttatcatggcatcttggcgtgatatggagaaaagatgcttatctcaatcatgtaactcgc 
aaatggattgaatttatttctgagatgaaaccaacatag 

40 

Sequence 2382 

MNDIVNVQKGHIKIGLSPMMNVQMFTNALNQFHRLYPNVTYEVIEGGGKIVENLTSNDDV 
DIGITTLPVDHTEFHSTSLYNEELLLVVSNDHHLAHLNKVDr4ADLKDEEFVLFHDDYYLK 
DQIIENCKRLGYYPKTVANISQISFIANMIQQGIGISIVPESLVNLMGNNVTSIQLENVE 
45 LSWHLGVIWRKDAYLNHVTRKWIEFISEMKPT* 

Sequc^nce 2383 

Contig_0747_pos_1831_8 63, 

is similar to (with p-value 8.,0e-89) 

50 >pir:pir IA25805IA25805 L-lactate dehydrogenase (EC 1.1.1.27) 
- Bacillus subtilis 
atgaaggagttcgttaaaatgaaaaaatttgggaaaaaagttgttttagtaggagacggt 
tccgtaggttcaagttatgcatttgctatggtgactcaaggaattgcagatgaatttgta 
attattgatattgcaaaagataaagtggaagcagacgttaaagatttaaaccatggtgca 

55 ctttacagttcttcaccagtgactgtaaaagctggagaatatgaagattgtaaagatgca 
gatttagttgttattacagcaggtgcacctcaaaaaccgggtgaaactcgtttacaactt 
gttgagaaaaatactaaaatcatgaaaagtatcgtaactagtgtcatggatagtggcttt 
gatggtttcttcctaattgctgcaaacccagttgatatcttaacacgttatgttaaagaa 
gttacaggtttaccagctgaacgtgttattggttctggtacagtgcttgatagtgcaaga 
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ttcagatatttaataagtaaagaattaggtgttacatcaagtagtgttcacgctagcatt 
ataggtgaacatggtgactctgaacttgcagtttggtctcaagcaaacgttggaggtatt 
tcagtgtatgatacattgaaagaagaaactggtagcgatgctaaagcgaatgaaatttat 
attaatacaagagatgctgcttacgatatcattcaagctaaaggatctacgtattatggt 
5 atagctctagcactattacgtatttctaaagctttactaaataatgaaaatagtattttg 
acagtttctagtcaacttaatggtcaatatggatttaacgatgtttatcttggcttacca 
acacttatcaatcaaaatggtgcagttaaaatttatgaaacaccattaaatgataacgaa 
ctacaattactagaaaaatcagtgaaaactttagaagacacttatgattctataaaacat 
ttagtttaa 

10 

Sequence 2384 

MKEFVKMKKFGKKWLVGDGSVGSSYAFAMVTQGIADEFVIIDIAKDKVEADVKDLNHGA 
LYSSSPVTVKAGEYEDCKDADLVVITAGAPQKPGETRLQLVEKNTKIMKSIVTSVMDSGF 
DGFFLIAANPVDILTRYVKEVTGLPAERVIGSGTVLDSARFRYLISKELGVTSSSVHASI 
15 IGEHGDSELAVWSQANVGGISVYDTLKEETGSDAKANEIYINTRDAAYDIIQAKGSTYYG 
lALALLRISKALLNNENSILTVSSQLNGQYGFNDVYLGLPTLINQNGAVKIYETPLNDNE 
LQLLEKSVKTLEDTYDSIKHLV* 

Sequence 2385 
20 Contig_07 4 7_pos_800_4 50 , 

is similar to <with p-value l.Oe-34) 

>gp:gp|L16975|LACALS_2 Lactococcus lactis alpha-acetolactate 
synthase (als) gene, complete cds. NID: g473900. >gp:gplA23 
961|A23961_1 I>. lactis alpha-acetolactate synthase gene. NID 
25 : g809617. 

atggcggaaaaacaatattctgcagcacaaatggtaattgatactttaaaaaataatgga 
gttgagtatgtatttggtattccaggtgcgaaaatcgactact tatttaatgcactagag 
gatgacgatattgaattagtcgttacgcgtcatgaacaaaacgcagcgatgattgcacaa 
ggtattggtcgtttaacaggaaaaccaggtgtggctattactacaagtggcccaggggta 
30 agtaacttaactactggtttattaactgcaacttctgaaggtgaccctgtattagctatc 
ggtggtcaagttaaaagaaatgacttattacgtttaacatcaatacgctaa 

Sequence 2386 . 

MAEKQYSAAQMVIDTLKNNGVEYVFGIPGAKIDYLFNALEDDDIELVVTRHEQNAAMIAQ 
35 GIGRLTGKPGVAITTSGPGVSNLTTGLLTATSEGDPVLAIGGQVKRNDLLRLTSIR* 

Sequence 2387 

Cor:t'..g_ O748_pos_124 6_334 2, 

is simiJar to (with p-value O.Oe+00) 

40 >sp:sp|P52026|DPOl_BACST DNA POLYMERASE I (EC 2.7.7.7) (POL 
I). >gp;gp(L4 21111BACPOL_l Bacillus stearothermophilus DNA p 
olymerase I (pel) gene, complete cds. NID: g806280. 
atgaaaggtctaatgggggatacctctgacaatattcctggcgttgctggtgtcggcgaa 
aagacggctattaaattacttaatcaatttgagtcagtagaaggggtctatgaacatatt 

45 gaggaggtcactgcaaaaaaattaaaagaaaaactcatcaatagtaaagatgatgcctta 
atgagtaaagatttagcaacaatcaatgttcacagtccgattgaagtatcattagaagat 
acaaaattaactctacaagacgacactacagaaaaaattgaactatttaaaaagctagaa 
tttaaacaactattagcagatatagacacatcctctacgaatgaagaagtcatagataaa 
acttttgaaattgagcaagactttcaaaatgtagatttgaatgatttaaacgaagcggta 

50 atacattttgaactcgaaggcactaattatcttaaagacactattctcaagtttggtttt 
tatacaaatcatcaacatgtagtgataaatgctgaggatgtaaaggattataaacattta 
gttcaatggcttgaagataaaaatacaactaaaattgtctatgatgcaaaaaaaacttat 
gtatctgctcatcgattagggattaatatagaaaatattgaatttgatgttatgttagca 
agctatattattgacccatcacgttctattgatgacgttaaatctgtggtaagtttatat 

55 ggacaaaattatgtaaaagataatattacaatatttgggaaaggtaagaaacatcatata 
cctgaagaaccaattctaaacgaacacattgcctctgtgacagaagctatagcagctgta 
actccaaccatgaaatcacagttagaagattataatcaaattgaactgttgaaagattta 
gaatc ac',:attagcaagaattttaagtgaaatggaagaaattggtatatataccg.:icatc 
aatgatttgaaagaaatggaattcgaaattcaaaaaaaattggatgtattaatatccaat 



586 



wo 01/34809 



PCT/USOO/30782 



attcatgagtcggctggtgaagcgtttaatatcaattctcctaagcaattaggtgttgtt 
ttatttgaaacattacaattgcctgtcattaagaagaccaaaacgggctattcaacagct 
gtagacgtattagaaaaactacaaggtgagcatcctattatagatgatattttagaatat 
agacaacttgctaagttgcaatctacgtatgtagagggattacaaaaagtaataagcaaa 
5 gatcatagaattcacacacgttttaatcaaacgcttgctcaaactggtagattatcaagt 
atagatcctaatttacaaaatatacctatacgattagaagaaggaagaaagattagaaag 
gcctttaaaccaacttctaaagatagtgtgattttatctgctgattattcacaaattgag 
ttacgtgtacttgctcatattacgcaagatgaaagtttaaaacatgcatttataaacgga 
catgatattcacactgcaacagcaatgaaagtatttaatgttgaatctgaccaggttgat 

10 agtttaat.gagacgtcaagcaaaagctgttaactttggtattgtatatggtatcaqcgat 
tatggattgagtcagagcttgggtattactagaaaacaagcaaaagcatttattgatgat 
tatttagctagttttccaggtgtaaaacaatatatgtcagacattgttaaagatgcaaaa 
gcacaaggttatgtggaaacactacttcatcgtcgtcgatacattcctgatataacaagt 
agaaacgttaatttaagaagttttgcagaaagaacagcaatgaatacacccatacaaggt 

15 agtgcagctgacataataaaattagcaatggttaaattcagtgaaaagattaaagaaact 
aaatatcatgctaagttattattacaagttcatgatgaactcatatttgaaataccaaaa 
tcagaagtagaagattttagtaaatttgtagaagaaattatggaacaagcattagtgctc 
gatgtacctttaaaagtagattcgaattatggtgcaacatggtacgatgctaaataa. 

20 Sequence 2388 

MKGLMGDTSDNIPGVAGVGEKTAIKLLNQFESVEGVYEHIEEVTAKKLKEKLINSKDDAL 
MSKDLATINVHSPIEVSLEDTKLTLQDDTTEKIELFKKLEFKQLLADIDTSSTNEEVIDK 
TFEIEQDFQNVDLNDLNEAVIHFELEGTNYLKDTILKFGFYTNHQHVVINAEDVKDYKHL 
VQWLEDKNTTKIVYDAKKTYVSAHRLGINIENIEFDVMLASYIIDPSRSIDDVKSVVSLY 

25 GQNYVKDNITIFGKGKKHHIPEEPILNEHIASVTEAIAAVTPTMKSQLEDYNQIELLKDL 
ELPLARILSEMEEIGIYTDINDLKEMEFEIQKKLDVLISNIHESAGEAFNINSPKQLGVV 
LFETLQLPVIKKTKTGYSTAVDVLEKLQGEHPIIDDILEYRQLAKLQSTYVEGLQKVISK 
DHRIHTRFNQTLAQTGRLSSIDPNLQNIPIRLEEGRKIRKAFKPTSKDSVILSADYSQIE 
LRVLAHITQDESLKHAFINGHDIHTATAMKVFNVESDQVDSLMRRQAKAVNFGIVYGISD 

30 YGLSOSLGITRKQAKAFIDDYLASFPGVKQYMSDIVKDAKAQGYVETLLHRRRYI-JDITS 
RNVNLRSFAERTAMNTPIQGSAADIIKLAMVKFSEKIKETKYHAKLLLQVHDELIFEIPK 
SEVEDFSKFVEEIMEQALVLDVPLKVDSNYGATWYDAK* 

Sequence 2389 
35 Cont ig_07 4 8_pos_4 2 5 5__4 8 5 7 , 

is similar to (with p-value 2.0e-31) 

>sp:sp|Q55515|Y553_SYNY3 HYPOTHETICAL 22.5 KD PROTEIN SLR055 
3. >gp:gp| D64006|SYCSLLLH_95 Synechocystis sp. PCC6803 compl 
ete genome, 25/27, 3138 604-3270709. MID: gl001291, 

40 gtgattgggataactggtggtattgccactggaaaatcaacagtttcagaattattaaca 
gcatatgggtttaaaatcgtagatgctgatattgcttcacgcgaagcagttaaaaaaggc 
tctaagggtcttgaacaagttaaagagatttttggggaagaagcaattgacgaaaatggt 
gagatgaatcgtcaatatgtaggagagatagtttttaatcatcctgacttacgcgaggct 
cttaatgaaatagttcatcctattgtaagagagataatggaacaagagaaaaacaattat 

45 ctagaacatggatatcatgtaattatggatatcccattgttgtacgaaaatgaactacaa 
gatactgtagatgaagtttgggtggtttatacatctgaaagtattcaaatcgatcgttta 
atggagaggaataatttatcattagaagatgctaaagcacgtgtttatagtcaaatatct 
atagataaaaaaagtaggatggcagatcatgtgatagataatctaggtgataaattagaa 
cttaaacagaatttacaaaaattacttgaagaagaagggtatattcaatcggagao-gaa 

50 tag 

Sequence 2390 

VIGITGGIATGKSTVSELLTAYGFKIVDADIASREAVKKGSKGLEQVKEIFGEEAIDENG 
EMNRQYVGEIVFNHPDLREALNEIVHPIVREIMEQEKNNYLEHGYHVIMDIPLLYENELQ 
55 DT VDEVW VVYTSES I QI DRLMERNNLSLEDAKARVYSQIS I DKKSRMADHVI DNLGDKLE 
LKQNLQKLLEEEGYIQSESE* 

Sequence 2391 

Contig_0751_pos_2197_2541, 
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putative peptide of unknown function 

atggog<=*tgatagaagaacgtaatttatcagggcttattcaaacactaactttca=itcat 
cccatcattcaaattcttaaagagaacacattaaatcaacttaaaatactctctcattat 
ttaccagagcgacaccctgcaatggtggcaattcaatcttggtcacaatggtttactgat 
5 catgggattactgaaatccaccttgatgtaactgcacaagcgcctagatcttattacaaa 
ggtatttttataaaatgtcatcttaaaaatactgctcatagcgttttgacaggtggatat 
tatcacggttcactagaaggttttggtttaggattaacactttaa 

Sequence 2392 

10 MEMIEERNLSGLIQTLTFNHPIIQILKENTLNQLKILSHYLPERHPAMVAIQSWSQWFTD 
HGITEIHLDVTAQAPRSYYKGIFIKCHLKNTAHSVLTGGYYHGSLEGFGLGLTL* 



Sequence 2393 

Contig_0751_pos_2566_3180, 

15 is similar to (with p-value 5.0e-27) 

>sp:sp|Q02129|HISl_LACLA ATP PHOSPHORIBOSYLTRANSFERASE (EC 2 
.4.2.17). >pir :pir ! D45734 I D45734 HisG - Lactococcus lactis s 
ubsp. lactis >gp: gp | U92974 I LLU92974_4 Lactococcus lactis unk 
nown gene, partial cds, and HisC (hisC) , unknown, HisG (hisG 

20 ), unknown, HisB (hisB), unknown, HisH (hish) , HisA (hisA), 
HisF (hisF) , HisIE (hisIE) , unknown, unknown, LeuA (leuA) , L 
euB (leuB) , LeuC (leuC) , LeuD (leuD) , unknown, IlvD (ilvD), 
IlvB (ilvB), IlvN, IlvC (ilvC), IlvA (ilvA), AldB (aldB) and 
aldR (aldR) genes, complete cds, NID: g2565137. 

25 atgttacgagttgcattagcaaagggtcgtttattaaagagttttatcgaatatttacaa 
caagttaatcagatagatattgcaactgtacttttaaatagacagcgacagttattgctt 
acagtcgacaacattgaaatgattttagttaaaggaagcgatgtgcctacttatgtagaa 
caaggtattgctgatgtaggaatagtgggaagtgatattctgaatggtcaaaaatataat 
attaataaattactcgatttgccatttggtaaatgtcattttgcgttggcggcaaagcca 

30 gaaacatctcgctataaaaaagtagcaacaagctatgtacatacagctactcaattcttt 
aataaagaaggtatggatgtagaagtgattcaccttaacggttcagttgaattgtcatgt 
gtagtggatatggttgatgctattgtagatattgtacaaactggttctacgcttacagct 
aacgggctcgttgagaaaaagcatatcagtgaaattaacgctaagttaattacaaataaa 
gaatcatattttaagcaatcatctgaaatagagagactaatcaagcagttaggagtgtct 

35 attaactatgcttag 



Sequence 2394 

MLRVALAKGRLLKSFIEYLQQVNQIDIATVLLNRQROLLLTVDNIEMILVKGSDVPTYVE 
QGIADVGIVGSDILNGQKYNINKLLDLPFGKCHFALAAKPETSRYKKVATSYVHTATQFF 
40 NKEGMDVEVIHLNGSVELSCWDMVDAIVDIVQTGSTLTANGLVEKKHISEINAKLJ.TNK 
ESYFKQSSEIERLIKQLGVSINYA* 

Sequence 2395 

Cont ig_07 5 l_pos_3 320_4 003, 
45 is similar to (with p-value 2.0e-41) 

>sp:sp|Q02136jHISX_LACLA HISTIDINOL DEHYDROGENASE (EC 1.1.1. 
23) (HDH) . >pir:pir IE45734 iE45734 HisD - Lactococcus lactis 
subsp. lactis 

gtggaaactgagaagcttgaattagagcaaagccaactaaaaaatgcatacgacatgcta 
50 gataatgaaacacgagatgcattagagcaaagctatcagagaattaaagtgtaccaagaa 
aatattaaggtaaaacaggaatcatctcaacaaactgaatgttatgaacgataccatcct 
atcgaacgtgtaggtatttatgtgccgggaggtaaggctagctatccgtctacagtatta 
atgactgcaacacttgctcaagtagcaggtgttaatgagattactgttgttaccccacct 
caaaatagcggtatatgtcaagaggtgttagccgcttgttacattacaggcgttcatcat 
55 gtttatcaagtcggtggagcacaaagtattgcggcgctaacttatggcacggaaactata 
aaaaaagtcgacaaaatcgtaggtccagggaatcaatatgttgcttatgccaaaaagttt 
gtattcggtcaagtaggcatagaccaaatcgcagaaccgacagaaatagccttgattata 
gacgaaagtgctgacttagacgcaatcgcttatgacgtatttgcacaagcagaacatgat 
gaaatggcttgtacttatgtgattagtgaaaatgaaaaggtacttaatcaattgaacact 
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ataatacaagagaaacttcagtag 
Sequence 2396 

VETEKLELEQSQLKNAYDMLDNETRDALEQSYQRIKVYQENIKVKQESSQQTECYERYHP 
5 lERVGIYVPGGKASYPSTVLMTATLAQVAGVNEITWTPPQNSGICQEVLAACYITGVHH 
VYQVGGAQSIAALTYGTETIKKVDKIVGPGNQYVAYAKKFVFGQVGIDQIAEPTEIALII 
DESADLDAIAYDVFAQAEHDEMACTYVISENEKVLNQLNTIIQEKLQ* 

Sequence 2397 
10 Contig_0751_pos_832_338, 

is similar to (with p-value 3.0e-78) 

>sp:sp| P51065|PPCK_STAAU PHOSPHOENOLPYRUVATE CARBOXYKINASE ( 
ATP) (KC 4.1.1.49). >gp:gp|U51133!SAU51133_l Staphylococcus 
aureus phosphoenolpyruvate carboxykinase (pcka) gene, comple 
15 te cds. NID: gl255261. >gp: gp I L42943 I STAPEPCK_1 Staphylococc 
us aureus (clone KIN50) phosphoenolpyruvate carboxykinase (p 
ckA) gene, complete cds. NID: g860731. 

atgtatcatttcttaagtggattcacgtctaaactagctggaacagaacgtggtgttact 
gaacctcaaccttcgttttcaacttgctttggtgcaccattcttacctttgagtccaaca 

20 aagtacgctgatctacttggaaatttaatcgatattcatgatgtagatgtatatctagta 
aatactggatggacaggtggtaaatatggtgtagggcgaagaattagtctacactatact 
cgtgaaatggtagatcaagcaatatcaggtaaattaaaaaatactaaatatattaaagat 
gatacatttggcttaaatattccagttcaaattgacagtgtacctacaactattctgaat 
cctatcaatgcttggaacaataaagataactacaaagcacaagcttacgatttgattcaa 

25 cgctttaataataattttaaaaaattcggcaaggaagtcgaacatattgccaacaaaggt 
gcatttaatcaataa 

Sequence 2398 

MYHFLSGFTSKLAGTERGVTEPQPSFSTCFGAPFLPLSPTKYADLLGNLIDIHDVDVYLV 
30 NTGWTGGKYGVGRRISLHYTREMVDQAISGKLKNTKYIKDDTFGLNIPVQIDSVPTTILN 
PINAWNNKDNYKAQAYDLIQRFNNNFKKFGKEVEHIANKGAFNQ* 

Sequence 2399 

Cont ig_07 52_pos_2 9 3 1_2 140, 

35 putative peptide of unknown function 

atgagcaataacatgtggagggggagaaagttgaacttaaaattagaccatatcattcac 
tatttacatcaattagagtcatttaagtttcccggagaaatattagaattgcaaaatggt 
ggaagacatcatcatttgggcacctttaatcaaatagcaccgattaaaaatagttatatc 
gaattgctagatgttgaaaatgagtcaaaacttagcaatatagctaaaactgaagaaggt 

40 cgtgtatcatttgctacaaaaatagtgcaggatcattttaaacaaggggttaaaggtatt 
tgttttagaacaaaggatataaatcaggttaaaagtactttagaaaatagaggcgttgat 
gtgataggtcctattgatatggaaagagaaaacaaaaaaggtcatcaaattcgttggaga 
ttgctatatattgctaaccctgactatacagtcaaaccacctttctttatagaatgggat 
aacaacaaaaagcaaaacctatcacaaatacataatttcaacttgtcatcgtttaaaatt 

45 aaagaggtgattattactagcactcaacgtgaaacaacagtaagtctttggaaagaatgg 
tataacctgaaaatagtaaatgaaacggctacatctactgatctcaaattagaaactgat 
gaagttatctataaaatagaagacggcaaagattcaggttttcatacattaataatgacc 
gatatcaatgccacagcaccatattcaatatttatacgtggtgctaaatatcgttttgag 
ccacccaactag 

50 

Sequence 2400 

MSMNt-^VPrRKLNLKLDHIIHYLHQLESFKFPGEILELQNGGRHHHLGTFNQIAPrr.":^SYI 

ELLDVENESKLSNIAKTEEGRVSFATKIVQDHFKQGVKGICFRTKDINQVKSTLENRGVD 
VIGPIDMERENKKGHQIRWRLLYIANPDYTVKPPFFIEWDNNKKQNLSQIHNFNLSSFKI 
55 KEVIITSTQRETTVSLWKEWYNLKIVNETATSTDLKLETDEVIYKIEDGKDSGFHTLIMT 
DINATAPYSIFIRGAKYRFEPPN* 

Sequence 2401 

Cont ig_07 5 2_pos_l 7 4 0_4 90 , 
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is similar to (with p-value 3.0e-24) 

>gp:gp|AF099966| AF099966_1 Staphylococcus sciuri factor esse 
ntial for methiciliin resistance FEMA IfemA) gene, complete 
cds. NID: g3820631. 
5 atggaaaagatgaacatcactaatcaacaacatgacgcatttgtgaaatctcatcccaat 
ggtgatttattacaattatctaagtgggcagatacgaaaaaattaacaggatggtattca 
agaagaattgctgtcggtgaaaatggtcaaattaaaggtgttggccagctactattcaaa 
aaaatacctaaacttccatacactttatgctatgtatctaggggatttgtagctgattat 
aataataaagaagtgttagaagctctacttagctatgctaaagaagtagcaaaagatgaa 

10 aagtcgtatgctatcaaaatagatcccgatgtcgaagtagataaaggtgcagaagcactt 
aaaaatctacgtgagcttgggtttaaacataaaggttttaaagaaggactgtctaaagac 
tatattcaaccaagaatgactatgattacgcctattgacaaaacagatgatgaattagtt 
caaagtttcgaacgtcgaaatcgttcaaaagtaagacttgcactgaagcgtggaactaaa 
gtagaacgatcaaatcgcgaggggcttaaaatctttgctaatttaatgaagataactggg 

15 gagagagatggttttttaactcgagatattagttattttgaaaatatatatgatgcactt 
catgaagacggtgatgcagaactcttccttgttaaattagagcctaagccagtattagat 
acggttaatcaagatcttgaagcacaattagctgagaaagagaaattacaatcaaaaaag 
caagataaaaagacacttaataaacttaatgatattgataataaaattaagaaaacaaat 
gaattaaaatcggatttaacagaacttgaaaaaagcgagccagaaggtatttacttgtca 

20 ggagcgctcttaatgtttgcaggaaacaaatcttactatctctatggcgcttcctcgaat 
gactatcgtgatttcttaccaaaccatcacatgcaatttgaaatgatgaaatatgcacgt 
gagcatggtgcaacaacctatgactttggtggtacagataatgatcctgataaagattca 
gaacattatgggttgtgggcttttaaacgagtttggggtacatatttaagtgaaaaaatt 
ggagaatttgattatgtattaaatcaaccgctatatcatttagttgagaaagtgaaacct 

25 cgtttaacgaaagctaaaattaaaatatcacgtaaacttaaaggtaaataa 

Sequence 2402 

MEKMNITNQQHDAFVKSHPNGDLLQLSKWADTKKLTGWYSRRIAVGENGQIKGVGQLLFK 
KIPKLPYTLCYVSRGFVADYNNKEVLEALLSYAKEVAKDEKSYAIKIDPDVEVDKGAEAL 
30 KNLRELGFKHKGFKEGLSKDYIQPRMTMITPIDKTDDELVQSFEEU^NRSKVRLALKRGTK 
VERSNREGLKIFANLMKITGERDGFLTRDISYFENIYDALHEDGDAELFLVKLEPKPVLD 
TVNQDLEAQLAEKEKLQSKKQDKKTLNKLNDIDNKIKKTNELKSDLTELEKSEPEGIYLS 
GALLMFAGNKSYYLYGASSNDYRDFLPNHHMQFEMMKYAREHGATTYDFGGTDNDPDKDS 
EHYGLWAFKRVWGTYLSEKIGEFDYVLNQPLYHLVEKVKPRLTKAKIKISRKLKGK* 

35 

Sequence 2403 

Con t i g_0 7 5 2_po s_0_3 5 9 , 

putative peptide of unknown function 

atggtagttttaattatattaggtggcgtttattcaagcgccaaattaaaacttgaatta 
40 ttaccagatgttgaaaatccagttatttcagttcaaactacaatgtctggagcaacaccc 

cagtcaacacaagatgaaataagtagcaagattgataatcaagtacgctcgttggcctac 
gtaaatagtgtacagactgaatctatacctaatgcttctatagtaactgtagaatacgat 
aal ggtacagatatggataaagctgaagaacaattaaaaaaagaaatcgacaaaat taag 
tttaaagatggcgttggtgaacccgaattaacaaggaactctatggatgctttTGTTTG 

45 

Sequence 2404 

MVVLIILGGVYSSAKLKLELLPDVENPVISVQTTMSGATPQSTQDEISSKIDNQVRSLAY 
VNSVQTESIPNASIVTVEYDNGTDMDKAEEQLKKEIDKIKFKDGVGEPELTRNSMDAFVX 

50 

Sequence 2405 

Contig_0753_pos_2 1 5 5_2 514, 

putative peptide of unknown function 

atgataaatattatattgaagaaaatagacttggaggtaattagaatgttcgttgttaca 
55 aatagaatcactgtaaaaaaaggatatgcaaaacaaatggcgcctaattttactaaagga 
ggacctattgaatctttaaagggctttgaaggtattgaagtttggcaaattgataaagat 
gattatagcgaagatatgtatgtaaatagttggtgggaaactgaagaagattttaaaaat 
tgggtgaatagtgatgtatttaaacaagcacataaaaatactggaaaatccgaagattca 
ccagtcattaaaagcgaaattgttaaatcaaatgttttatcttctttgaacagaagataa 
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Sequence 2406 

MINIILKKIDLEVIRMFWTNRITVKKGYAKQMAPNFTKGGPIESLKGFEGIEVWQIDKD 
5 DYSEDMYVNSWWETEEDFKNWVNSDVFKQAHKNTGKSEDSPVIKSEIVKSNVLSSLNRR* 



Sequence 2407 

Cont i g_0 7 5 3_pos_4 0 1 9_2 700, 

10 is similar to {with p-value 3.0e-88) 

>sp:spl P30267 I YKAABACFI HYPOTHETICAL 50.9 KD PROTEIN IN KAT 
A 3'REGION (ORF A). >pir : pir 1 3274 91 1 S27491 hypothetical prot 
ein A - Bacillus firmus >gp: gp | L0254 8 1 BACKATA2_1 B.firmus OR 
F A and ORF B, complete cds. NID: gl43118. 

15 at^tcat*-.gttagggattatattatttttagttcctataccagtcgttcaagatgfjaaag 
cagcaaacgacacttcctatagcttttttagctggtttattaaaagattggcttgytggt 
atcatgccaattttaattgtaaccatcataactgtatcaggtattttaacaatattatgc 
tctacaatttataaaaataaattaaatcctcaaggtttaatgagcagtgctttcaacgtt 
aaaataggatggcttgttttgagagtattagctgtcttcttttcttggttaacattttta 

20 aatattggacctgaaatgattaaatctgaagatacaggtggattagtattttcaagttta 
ttacctactcttgtagcagtatttttatttgctgcaatctttttacctttattaatggag 
tatggtctattagaattacttggacccatctttagacctatcatgcgacctttgtttact 
ttacctggtagatcgacagttgataatctagct teat t tat aggtgatggtacagttggt 
gttttaattactagtagacaatatggtgaaggatattactctagaagagaagcaacagta 

25 atatccacaacctttagtgttgtatctattacgttcgctattgtcattgccgaaacaatt 
agaatgcaagatcaatttttctatttttatttaacagttgtcatttcatgcttaattgca 
gcaatgattatgccaagaatttggccacttaaaaatattcctgacgaatatgctaaagaa 
gtaagtgaagaggctcgtaatgaacagctaccagaaggcaaaacagcattaaaatatggt 
tttgatttagcaactgaagttggaattaaatcgccagggtttaaagaatttttaatttca 

30 ggttttaaaacagttgtagatatgtggtttgtaattttaccagttgttatgagtatagga 
acaatagctaccattattgctaactacacgcctgtttttgaaattataggaaaaccattt 
gttccagtactagaattgttacaaattccagaagcacatgaagcatcacaaacaatttta 
attgggtttgccgatatgttcttaccttcaattcttattgaaggggttcaaaatgatgta 
acacgttttgtaattggagcattgagtatctcacaacttgtgtatttatctgaagtaggc 

35 ggcgtgc.ttcttggttctaaaattccagttagtataagtaaattatttatgattti- ttta 
attcgtactatcattacgcttccaataattgctttattagcgcatttatttatcggataa 



Sequence 2408 

40 MSLLGIILFLVPIPVVQDGKQQTTLPIAFLAGLLKDWLGGIMPILIVTIITVSGILTILC 
STXYKNKLNPQGLMSSAFNVKIGWLVLRVLAVFFSWLTFLNIGPEMIKSEDTGGLVFSSL 
. LPTLVAVFLFAAIFLPLLMEYGLLELLGPIFRPIMRPLFTLPGRSTVDNLASFIGDGTVG 
VLITSRQYGEGYYSRREATVISTTFSWSITFAIVIAETIRMQDQFFYFYLTVVISCLIA 
AMIMPRIWPLKNIPDEYAKEVSEEARNEQLPEGKTALKYGFDLATEVGIKSPGFKEFLIS 

45 GFKTVVDMWFVILPVVMSIGTIATIIANYTPVFEIIGKPFVPVLELLQIPEAHEASQTIL 
IGFADMFLPSILIEGVQNDVTRFVIGALSISQLVYLSEVGGVILGSKIPVSISKLFMIFL 
IRTIITLPIIALLAHLFIG* 

Sequence 2409 
50 Cont ig_0 7 5 3_pos_l 6 1 7_ 1 1 0 8 , 

putative peptide of unknown function 

atgacaggtaaaacacacgcatcatgtggctttttagtcggtgcaataaccacacaatat 
tttcatacagatatatttacttctatatcagtgattgtactttcagtcatttcaagtata 
ttgccagatatatgtcatacacaaagtaaaataggaagacgatttaggcttactaguttt 
55 tttgtcagaattttatttggtcatagaacatttacgcattcacttttatttattatagga 
attagttttttactgtacttcatacaaactccgatgtattatatggttgcaattgttatt 
ggtatgttttcgcatgttatacttgatatattaaccccaagaggtgttaaactattatat 
cctt taccatttaatatcgtatcacccattcattttaaaactgggggactagtagatgta 
tctctagctactgcattaagtgttggtgcgatatatactttatttcaaccatatttaaat 
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actatgatgcactattggttaatcaaataa 

Sequence 2410 

MTGKTHASCGFLVGAITTQYFHTDIFTSISVIVLSVISSILPDICHTQSKIGRRFRLTSF 
5 FVRILFGHRTFTHSLLFIIGISFLLYFIQTPMYYMVAIVIGMFSHVILDILTPRGVKLLY 
PLPFNIVSPIHFKTGGLVDVSLATALSVGAIYTLFQPYLNTMMHYWLIK* 

Sequence 2411 

Contig_0754_pos_2035_24 66, 

10 putative peptide of unknown function 

atgatacaaggtttaggctatttattgtccaatataacagattataaagaattaacgaat 
ttagctcaaaatggagatcgtgatgccattgatttaaaagtaaaacatatttataaagat 
actgaaccaccaattcctggagatttaacagcagcaaattttggaaatgtattacatcac 
ttagataatcagtttacatcagctaacaaacttgcctctgcaattggcgtcgttggtgaa 

15 gttataacaactatggctattacattagcacgtgaatataagactaagcacgttgtatat 
atcggttcatcatttaataacaatcaattactacgtgaagttgttg.aaaattacactgtt 
ctaagaggatttaaaccgtactatattgagaatggtgctttttcaggcgctttaggagca 
ctttacctctaa 

20 Sequence 2412 

MIQGLGYLLSNITDYKELTNLAQNGDRDAIDLECVKHIYKDTEPPIPGDLTAANFGNVLHH 
LDNQFTSANKLASAIGVVGEVITTMAITLAREYKTKHVVYIGSSFNNNQLLREVVENYTV 
LRG FKPY Y I ENGAFSGALGALYL* 

25 Sequence 2413 

Con t i. g_0 754 _pos_4500_ 5048, 

putative peptide of unknown function 

atggatgattataaagaatatagaaaaagacttatcgttaaattaaaaaaacctataggt 
agagatttatataatagattatataaaaatattcaagatactttagaacctgaggtttat 

30 gaaattgctcctaatactaaattaggacattttcctggttatcagaatgtaacgttatcc 
cacccacaaatgcaacaaattatatcaagaaatgaacctagttggaaacaagctttaatg 
aatgtaaaaggtgtttatgttataaccgacttaagtaatggcaaattatacataggatca 
gcatcaggtaatactgatggaatatggcaacgatggtcggactatgccaacatagaaaat 
ctaacaggtggtaataaattattaaatgaaattaaattagataaagggaaagattacatc 

35 ataaataattttcaatattcaattttagagatttttgatacaaagactaaggtagacact 
ataatcaatagagaaaattattggaagaatgtattttgcactagaaaatatggtatgaac 
tttaactaa 

Sequence 2414 

40 MDDYKEYRKRLIVKLKKPIGRDLYNRLYKNIQDTLEPEVYEIAPNTKLGHFPGYQNVTLS 
HPQMQQIISRNEPSWKQALMNVKGVYVITDLSNGKLYIGSASGNTDGIWQRWSDYANIEN 
LTGGNKLLNEIKLDKGKDYIINNFQYSILEIFDTKTKVDTIINRENYWKNVFCTRKYGMN 
FN* 

45 Sequence 2415 

Con t ig_0 754 _pos_5 8 4 8_7 851, 

is similar to (with p-value O.Oe+00) 

>sp:sp|P05425|ATKB_ENTFA POTASSIUM /COPPER- TRANS PORTING ATPAS 
E B (EC 3.6.1.36). >pir : pir | B45995 I B45995 Cu2+-transporting 

50 ATPase (EC 3.6.1.-) - Enterococcus hirae >gp: gp I L13292 | ENECO 
PPUMP_2 Enterococcus hirae ATPase (copA) gene, complete cds; 

ATPase (copB) gene, complete cds. NID: g290641. 
atgcatcatgataaccatgcctcacatcatcatagtggccatgcacatcatcatggaaat 
tttaaagttaagttttttgtttcattaatttttgcaatacctatcattctcttatcgcca 

55 atgatgggtgttaacttaccttttcaattcacatttccaggttctgaatgggtagtgtta 
atattaagtacaattttattcttttatggtggtaaaccgttcttgtctggtggtaaagat 
gaaattgctacaaaaaaaccaggcatgatgaccttagttgctctaggtatttcagtagct 
tatatttatagcttgtatgctttttatatgaataactttagtagtgcaactggtcataca 
atggactttttttgggaattagcaaccttaattttaattatgctattaggacattggata 

\ 
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gaaatgaatgctgtcggaaatgctggagatgctttaaagaaaatggcagaactattacct 
aatagtgctattaaagttatggataatggccaacgcgaagaagttaaaatatcagacatc 
atgactgatgatatcgtcgaagtaaaagccggagaaagcattccaacagatggtattatc 
gttcaaggacaaacatctatagatgaatccctagtcactggagaatctaaaaaagtacaa 
5 aaaaatcaaaatgacaacgtcatcgggggttctattaatgggtctggaacaatacaagtc 
aaggttacagctgttggagaagatggatatctttctcaagttatgggacttgttaatcaa 
gcacaaaatgataaatctagtgctgaattgttatctgataaagtagcgggttatttattc 
tactttgctgtaagtgttggcgtgatttcttttattgtctggatgctcattcaaaatgat 
gttgattttgcattagaacgtcttgtaactgtgttagtcattgcttgtccacatgcttta 

10 ggcttggcaatacctttagtcactgcacgttctacttcaattggtgcacataatggttta 
attattaaaaatagagagtctgtagaaatagctcaacatatcgattatgtaatgatggat 
aaaactggtactttaactgagggtaacttttctgtgaatcattatgagagctttaaaaat 
gatttgagtaatgatacaatattaagccttttcgcctcattagaaagtcaatctaatcac 
ccattagctataagtattgttgattttgcgaaaagtaaaaatgtttcatttactaatcca 

15 caagacgttaataatattccaggtgtcggattagaaggtctaattgataataaaacatat 
aaaataacaaatgtctcttatcttgataaacataaacttaattatgacgatgacttgttt 
actaaattagctcaacaaggtaattcaatcagttatttaattgaggatcaacaagtcatt 
ggcatgattgctcaaggagatcaaattaaagaaagctcaaaacaaatgatagctg.ij ttta 
ctatcaagaaatattacaccagtcatgcttacaggtgacaataatgaagtggcacacgct 

20 gtcgcaaaagaattaggtattagtgatgttcacgcacaactcatgccagaagataaggaa 
agcattataaaagattatcaaagtgacggtaataaagtcatgatggtcggagacggtatc 
aacgatgcgccgagtcttataagagccgatattggtatagcaattggtgcaggcacagat 
gttgcagtggattcaggtgata tea tacttgttaaaagtaatccatcagatatcat teat 
ttcttgactctttcaaataatactatgagaaaaatggtgcaaaacttatggtggggtgca 

25 ggttataatattgttgctgtacctttagcagctggcgcattagcttttatcgggttaata 
ttatcaccagctgtaggagcaatattaatgtctttaagtacagttatagtagcgattaat 
gcttttacattaaaattaaaataa 

Sequence 2416 

30 MHHDNHASHHHSGHAHHHGNFKVKFFVSLIFAIPIILLSPMMGVNLPFQFTFPGSEWWL 
ILSTILFFYGGKPFLSGGKDEIATKKPGMMTLVALGISVAYIYSLYAFYMNNFSSATGHT 
MDFFWELATLILIMLLGHWIEMNAVGNAGDALKKiyiAELLPNSAIKVMDNGQREEVKISDI 
MTDDIVEVKAGESIPTDGIIVQGQTSIDESLVTGESKKVQKNQNDNVIGGSINGSGTIQV 
KVTAVGEDGYLSQVMGLVNQAQNDKSSAELLSDKVAGYLFYFAVSVGVISFIVWMLIQND 

35 VDFALERLVTVLVIACPHALGLAIPLVTARSTSIGAHNGLIIKNRESVEIAQHIDYVMMD 
KTGTLTEGNFSVNHYESFKNDLSNDTILSLFASLESQSNHPLAISIVDFAKSKNVSFTNP 
QDTONIPGVGLEGLIDNKTYKITNVSYLDKHKLNYDDDLFTKLAQQGNSISYLIEDQQVI 
GMIAQGOQIKESSKQMIADLLSRNITPVMLTGDNNEVAHAVAKELGISDVHAQLMPEDKE 
SIIKDYQSDGNKVMMVGDGINDAPSLIRADIGIAIGAGTDVAVDSGDIILVKSNPSDIIH 

40 FLTLSNNTMRKMVQNLWWGAGYNIVAVPLAAGALAFIGLILSPAVGAILMSLSTVIVAIN 
AFTLKLK* 

Sequence 2417 

Contig_0754_pos_3900_3526, 

45 putative peptide of unknown function 

atggaaataggcgtaagcaaaggtttttttggtgtagcaggttttgacttactagtagat 
gataataatgatgtttatgcgattgatttaaactttaggcaaaacggatcaacgagtatg 
ctacttttagcaaaagatttaactcatggatatcataaattttacagttacttttctaat 
ggagataatacaaaattctataatgctattttaaaatacgtagaattaggtgtactttat 

50 ccactttcctattacgatggagattggtatggaaagaatcaagttaattctagatttggc 
tgcatttggcatggggaaaataaagaattaattaatcgatatgaacaacaatttatattg 
gaagctggattataa 

Sequence 2418 

55 MEIGVSKGFFGVAGFDLLVDDNNDVYAIDLNFRQNGSTSMLLLAKDLTHGYHKFYSYFSN 
GDNTKFYNAILKYVELGVLYPLSYYDGDWYGKNQVNSRFGCIWHGENKELINRYEQQFIL 
EAC/j* 

Sequence 2419 
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Contig_0754_pos_3352_2687, 

putative peptide of unknown function 

atgtcatacaaatatgaagcattttttaaagatattttgattaatgaatatatttatttt 
gcttcaaaaaataaaaaattaattagaatacaacatgagaatttgccatatat tgctatg 
5 tggacagacgaaaatgttgctgagtcttatttgttacatcattcaattgattacgacaaa 
atcattagagcagatattgaccgttttgtaacatatgaaatggatgaaatctttgatcca 
ggtgacaaagttttagttaatgtgaataatggtgaagaaggaaacattgtagatatagtt 
aaaatgactgatgagttgatgtctgaattagatgatataagaatgagagagtttattaaa 
ga' gtcg.-.aaaatatgacgaagtatacggattgacaaacaaaggtgaaaagaatt^catt 
10 atgatttcagatgatgaccataacaaaccacacatcatgcctgtttggagtattaagagt 
agagcgcgtaaagtacgtgatcaagattttgaagaatgtgatttaatcgaaattgaaggt 
gaagtctttagtgaatggttagacaagttacgcgatgataataaagcagtagcgattgat 
ttgaaatcaggtgttgttggtactgttgtatcagcgcaaaaactgtcaaatgaagcaaca 
ttttaa 

15 

Sequence 2420 

MSYKYEAFFKDILINEYIYFASKNKKLIRI.QHENLPYIAMWTDENVAESYLLHHSIDYDK 
IIRADIDRFVTYEMDEIFDPGDKVLVNVNNGEEGNIVDIVKMTDELMSELDDIRMREFIK 
DVAKYDEVYGLTNKGEKNFIMISDDDHNKPHIMPVWSIKSRARECVRDQDFEECDLIEIEG 
20 EVFSEWLDKLRDDNKAVAI DLKSGVVGTVVSAQKLSNEATF* 

Sequence 2421 
Contig_0754_pos_1366_569, 

putative peptide of unknown function 

25 atgtttataaaaaaggattttgatgatattacagttcaagtatttgaagaaaaatataga 
gatgcacttaaccaatttgaattaagtgaacgacaacaaatatattcttcattgcctcaa 
actgttttagatgatgcat taaaagatgaaaatcgaattgctaatgtagctttaaataaa 
gaaggaaaagtagtggggttcttcgttttgcatcgttattatcaacatgaaggttar.gat 
acaccarificaatgttgtttatgtacgttcattgtcagttaatgaaaagtttcaag»jccat 

30 ggatatgggacaaaaatgatgatgtttttaccagagtatgttcaagcattatttcctgat 
tttacacatttatacttagtagtagacgctgaaaaccaaagtgcttggaacgtttatgaa 
cgtgcaggttttatgcatacagctacaaaagaagaaggacctattgggaaagaaagactt 
tattatttagatttagattcaaaacatgtatcttctttaaggctaaaagagggggaagtc 
acatataatgatgatattcacgtgattaatttgcttaaagatgatgtaaaggtaggcttt 

35 attgcactagaacaaaatgataataaaatgaatatttctgcaatcgaagttaataagaaa 
aataggaatgagggaattgcagaaagtgctttacgccaattaccaacgtatatacgtaaa 
cagtttgaagacattgaagttttatcaat tact ttagtgctccagcacgcactaact cat 
ttaatgtatctgcaataa 

40 Sequence 2422 

MFIKKDFDDITVQVFEEKYRDALNQFELSERQQIYSSLPQTVLDDALKDENRIANVALNK 
EGKVVGFFVLHRYYQHEGYDTPNNVVYVRSLSVNEKFQGHGYGTKMMMFLPEYVQALFPD 
FTHLYLVVDAENQSAWNVYERAGFMHTATKEEGPIGKERLYYLDLDSKHVSSLRLKEGEV 
TYNDDIHVINLLKDDVKVGFIALEQNDNKMNISAIEVNKKNRNEGIAESALRQLPTYIRK 

45 QFEDIEVLSITLVLQHALTHLMYLQ* 

Sequence 2423 
Contig_0755_pos_1306_1917, 
is sJ.inilar to (with p-value 8.0e-42) 
50 - >gp:gp|AF012285|AF012285_40 Bacillus subtilis mobA-nprE gene 
region. NID: g3282109. >gp: gp | Z99111I BSUB0008_137 Bacillus 
subtilis complete genome {section 8 of 21) : from 1394791 to 
1603020. NID: g2633699. 

atgacccaatatactttttcacctaaagattttaaagcttttgaagtcgaaggtttagac 
55 caaagaatggaagcacttaatgactatgtcagacctcaacttcatcaattaggatcttat 
tttgaagaatatttcactacacaaacaggtgaaactttttatgctcacgtagctaaacac 
gcacgtagaagtgtcaatccacctatcgatacgtgggtagcttttgctcctaataaacgt 
ggttataaaatgttaccacactttcaaatcggattgtttagaaatcagcttttcattatg 
ttcggtatcatgcacgaaggtagaaataaagaagaaaaagtgaaaatatttgataaacat 
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tttgataaactgacatctttaccaagtgattatagtgtttctctagatcatatga^aact 
gaaaagcactatatcaaggatatgagtaatgaagagttgcatgctgctatcgatagagtt 
aaaaatgttaaaaaaggtgaattttttgttgccagaacattatcaccaaccgataaaaga 
ttaaaatctgataagtcttttctaaaatttgttgaggaaacttttgatgaatttttaaaa 
5 ttttatcaataa 

Sequence 2424 

MTQYTFSPKDFKAFEVEGLDQRMEALNDYVRPQLHQLGSYFEEYFTTQTGETFYAHVAKH 
ARRSVNPPIDTWVAFAPNKRGYKMLPHFQIGLFRNQLFIMFGIMHEGRNKEEKVKIFDKH 
10 FDKLTSLPSDYSVSLDHMKTEKHYIKDMSNEELHAAIDRVKNVKKGEFFVARTLSPTDKR 
LKSDKSFLKFVEETFDEFLKFYQ* 

Sequence 2425 

Contig0755_pos_5604_4 4 53, 

15 putative peptide of unknown function 

atgttaggagagcaatatacacaaattaagcgtccagcaaatcggctaactgaaaaaata 
ttaggttggtttagttgggtattcttactcatattaactattgtttcaatgtttattgcg 
ctcgtatcttttagtaatgatacgtcaattgccaatttagaaaacacacttaataataat 
gaactcgtacaacaaattttagccaataatgatttaagtacaactcaatttgtgatttgg 

20 ttacaaaatggagtttgggcaattattgtttattttattgtttgtttgctcatctcgttt 
ttagcgttaatttctatgaatataagaattttgtctggtttactttttttaatagctgct 
atagtcacaattccgcttgtattgttgattgtaactctaatcattcctatcttattcttt 
atcattgcaatgatgatgtttgctagaagagatagaatagaaacagtgccatcttattat 
aatgaatatgatcaaccatactatgatgagagaggtttttatgaaccagagtcaagaaat 

25 gaacatggatataatgatgatgtgtatgaacctatgcatactaaaaaggaagatagaaat 
acaagacgtcaattcaatagaaatgctcagcaacaagattcctataatggtataactgat 
aatcaacccgatgaagatacatcttccgatcaactttattcagacgaatatgtagataat 
gaagataaa tat tctcaatttccaaaaagagcagttgaaagtgaa tat goat ctcaacaa 
actgaagatgaaccaacagtcatgtcaagacaagctaagtacaataaaaaatctaaaaat 

30 acggattttgaagatgcgcaacaggaacatatggaaggtaatcaatttgatgacatagga 
gttgttgaaccacaaattgatcctaaagaactaaaagcgcaaagaaaaagagaaaaagca 
gaaatacgtgctaagaaaaaagaaaagagaaaagcatataataaacgtatgaaagaacga 
agaaaaaaccagccaagtgctgttaaccaacgacgtatgaattatgaagaacgtcgacaa 
atgattaataatgaacaagaagatacagataataacttaaatcaacaggaagattcaaaa 

35 aaagaaaattaa 

Sequence 2426 

MLGEQYTQIKRPANRLTEKILGWFSWVFLLILTIVSMFIALVSFSNDTSIANLENTLNNN 
ELVQQILANNDLSTTQFVIWLQNGVWAIIVYFIVCLLISFLALISMNIRILSGLLFT.IAA 
40 IV':'It*L^7.LIVTLIIPILFFIIAMMMFARRDRIETVPSYYNEYDQPYYDERGFYE'.^£SRN 
EHGYNDDVYEPMHTKKEDRNTRRQFNRNAQQQDSYNGITDNQPDEDTSSDQLYSDEYVDN 
EDKYSQFPKRAVESEYASQQTEDEPTVMSRQAKYNKKSKNTDFEDAQQEHMEGNQFDDIG 
VVEPQIDPKELKAQRKREKAEIRAKKKEKRKAYNKRMKERRKNQPSAVNQRRMNYEERRQ 
MINNEQEDTDNNLNQQEDSKKEN* 

45 

Sequence 2427 

Contig_0755_pos_1075_314 , 

is similar to (with p-value 2.0e-65) 

>sp:sp|Q45499 |SUHB_BACSU EXTRAGENIC SUPPRESSOR PROTEIN SUHB 
50 HOMOLOG. >gp:gp|AF012285|AF012285_41 Bacillus subtilis mobA- 
nprE gene region. NID: g3282109. >gp:gp| Z99111 IBSUB0008_139 
Bacillus subtilis complete genome (section 8 of 21) ; from 13 
94791 to 1603020. NID: g2633699. 

atgatgcaagaagagttagacattaaaactaaatcgaacccaaatgatttagttacaaat 
55 gtggataaggcgacagagaattatctatatgaaacgattcttcataattatccagatcat 
caggttattggcgaagagggacatgg teat aatctcgagtatttaaagggggt tat ttgg 
gttattgatccaattgatggaacacttaattttgttcaccaaaaagaaaattttgccatc 
tctattggtatttatcatgatgggaagccttatgcaggttttgtttatgatgtcatgaaa 
gatgttttatatcatgcaaaggttggacagggtgcatttgaaaatacacataaactngaa 
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atgattcaaaatactgaacttaaaagaagtattataggtattaatcccaattggctgacg 
aaaccaatactcagtgatatttttagttcaatagtgaatgaggcaagaagtgcacgagca 
tatggtagtgcagcattagaaattataagtgtagcgaagggtcaattggcggcttaccta 
acacctagactacaaccgtgggattttgcaggtggattgttgattttgaacgaagtaggt 
5 gggataggaaccaacttattaggcgataaattagacttcaatcaaccgaattcaatatta 
atagcaaatcctagccttcatcgtgaaatattaaatcatcatttaaatcagcaaagagat 
acgcttattacacttcatgaaaaaaggtttggaaagagatag 

Sequence 24 28 

10 MMQEELDIKTKSNPNDLVTNVDKATENYLYETILHNYPDHQVIGEEGHGHNLEYLKGVIW 
VIDPIDGTLNFVHQKENFAISIGIYHDGKPYAGFVYDVMKDVLYHAKVGQGAFENTHKLE 
MIQNTELKRSIIGINPNWLTKPILSDIFSSIVNEARSARAYGSAALEIISVAKGQLAAYL 
TPRLQPWDFAGGLLILNEVGGIGTNLLGDKLDFNQPNSILIANPSLHREILNHHLNQQRD 
TLITLHEKRFGKR* 

15 

Sequence 24 29 

Contig_0756_pos_1371_1877, 

putative peptide of unknown function 

atgaaagacaacaaacctaataattcgaaattaattcaaacatatttaagtaagaaaact 
20 ttaagatatggtacagcaagtgcattaacattggcactctatttatttaacagtaacgta 
actgtgtatgcggatgaaaatactgcaaaccaaaatcaaggaacatcaccaaaaacttca 
cagacagcacctacaaataatactgaaaatacagatgccacagccataacaacagatcaa 
aataataatgatgaagaagaatacgatgcgtcatatgaacttccaattctttatgtaact 
gtctggctagatgatcaaggaaatattattaaagatgctgtggaagatgctaaaacccct 
25 gcttcagaaaggcaaccggtgaaaattcctgggtaccaacattatagaacttctgtgagt 
gacggaat tact aagtt tat ttatcgtaaaattagcactgcacaatcacctatagttgaa 
aatcaaaaacgtatggtcgtggcatag 

Sequence 2430 

30 MKDNKPNNSKLIQTYLSKKTLRYGTASALTLALYLFNSNVTVYADENTANQNQGTSPKTS 
QT7^PTNNTENTDATAITTDQNNNDEEEYDASYELPILYVTVWLDDQGNIIKDAVED7aKTP 
ASERQPVKIPGYQHYRTSVSDGITKFIYRKISTAQSPIVENQKRMVVA* 

Sequence 2431 
35 Cont i g_0 7 5 6_po s_4 9 9 8__4 5 1 0 , 

putative peptide of unknown function 

atgccaaaagtacatcaagttaaggaaagatttgtgaaattaggggaccaacagtttaaa 
gcatttgaaattagatacgatacatacattcattacgtgttgatgtgtgatggtgtagat 
ttagcaatgaaacagcgcgtggaagattttgtcagtgcgcaaacatggcatcaacaattt 

40 aaaacgattggcgtcatgctttttcaacaagataaacaattcatatatccactgatacat 
atacctaaaatagatagcttaatctgggaaaatagctgtggttcaggagcggcttctatc 
ggtgtgttagttaattatctaacagatcatgatattcaagattacctagttaaccaaccc 
ggaggcagtattattgtctcatccagaaagtctggacaaaatgaataccaaacaacgatt 
aagtgtcaagtttcaactgtcgcaacaggacaagcatatatagaacaggagacaatgacg 

45 caaatatga 

Sequence 2432 

MPKVHQVKERFVKLGDQQFKAFEIRYDTYIHYVLMCDGVDLAMKQRVEDFVSAQTWHQQF 
KTIGVMLFQQDKQFIYPLIHIPKIDSLIWENSCGSGAASIGVLVNYLTDHDIQDYLVNQP 
50 GGS I I VSSRKSGQNEYQTT I KCQVSTVATGQAYI EQETMTQI * 

Sequence 2433 
Con t ig_0 7 5 6_pos_3 6 90_2 413, 
putative peptide of unknown function 
55 atggttggtagtggaccggtcgctattcaacttgctcgactatgtcatttacatggagaa 
catatagttgatatggtgagtcgcgttcatgcatcaaccaaatctaagagagtctttgat 
gcttatcaacgtgacggctttttttcagtaatgactcaaaatgatgcacatcagtgtttt 
tcaggtaagtttacggttagacatttttttaaagatgttaaagatattactgaatattat 
gacgtggtgattttagcatgtactgccgatgcgtatcgaccgatattacagcaattatct 
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aagtccacattaaagcgtattaagcaaatcatcttggtctcaccaacattaggatcacat 
atgcttgttaagcaattactatcagatgttcaatgtgaaggtga.agtgatttcattttcc 
acttatt:taggcgatacccgaatatttgataaagcacaaccacattgtgtcctaa jcaca 
cgagttaaatcaaaattattcgtaggttcgactcaatctcagtctatgacgttgtgtaag 
5 cttaagtctttatttgactatttgaatatagaattaacaacgatggacacaccactacat 
gcggagatacataatagttcactttatgtacacccaccattgtttatgaatcaattttca 
ttaaaggcggtatttgaagggacgaaagtaccagtatatgtatataagctatttccagag 
ggtccaatcacaatgaccttaatacacgaaatgcgattaatgtggcaagaaatgatgatg 
atattaaaaaaattaaaggtaccttcggtcaatcttctaaagtttatggtgaaagaaaac 

10 taccctatacgttatgagaccatgcgcgaagtagatattgaaaactttaaaaatttacca 
gctattcatcaagagtatctactttatgtgcgatatacagcaattttaatcgatccgttt 
tctaatccggacgatcaaggtgcatattttgatttttctgccgtaccatacaaacatgtt 
gatactgatgaacaaggagtcatacatataccacgcatgccgagtgaagattattatcgt 
actttgataattcaagcgattggaagagcattaaacgttgcaacaccgatgattgacaca 

15 ttgttattacgttatgaaaatactgttaaacaatactgtgacacacatttacatcaacaa 
ctatcaaggcaattcgaattacatcattttaaacaggatttagcgttagtgacgaactac 
ttaactttttataaataa 

Sequence 2434 

20 MVGSGPVAIQLARLCHLHGEHIVDMVSRVHASTKSFCRVFDAYQRDGFFSVMTQNDAHQCF 
SGKFTVRHFFKDVKDITEYYDVVILACTADAYRPILQQLSKSTLKRIKQIILVSPTLGSH 

lyalvkqllsdvqcegevisfstylgdtrifdkaqphcvlttrvksklfvgstqsqsmtlck 
lkslfdylnielttmdtplhaeihnsslyvhpplfmnqfslkavfegtkvpvyvyp:lfpe 
gpitmtlihemrlmwqemmmilkklkvpsvnllkfmvkenypiryetmrevdienfknlp 

25 AI HQE YLLYVRYTAI LI DPFSNPDDQGAY FDFSAVPYKHVDTDEQGVIH I PRMPSED Y YR 
TLIIQAIGRALNVATPMIDTLLLRYENTVKQYCDTHLHQQLSRQFELHHFKQDLALVTNY 
LTFYK* 

Sequence 24 35 
30 Cont ig_0 7 5 6_pos_2 3 62_1 8 4 1 , 

is similar to (with p-value 3.0e-63) 

>gp: gp I AF076683 I AF076683_1 Staphylococcus aureus oligopeptid 
e transporter putative substrate binding domain (opp-lA) , ol 
igopeptide transporter putative membrane permease domain (op 
35 p-lB) , oligopeptide transporter putative membrane permease d 
omain (opp~lC) , oligopeptide transporter putative ATPase dom 
ain (opp-lD), and oligopeptide transporter putative ATPase d 
omain (opp-lF) genes, complete cds; and unknown gene. NID: g 
3800817. 

40 atgaataaactcacaaaactaagtacagtcatttttgtatctggaattattttagccggt 
tgtggaaataacaaagaactaacagagaaaaaagagaataaagtattatcatatacaact 
gtcaaagatattggagatatgaatccccatgtttatggaggttcaatgtcagcagaaagt 
atgatttatgagccgttagttcgcaataccaaggatggtattaagccattattagcaaaa 
aaatgggacatttcacctgatggtaagacatatacgtttcatttaagggatgatgtatct 

45 tttcatgatggtacgaaatttgatgcagatgcagtgaagaaaaacatcgatgcagtacaa 
caaaataagaaactacattcatggttaagactttcaacactgattgatgatgtcaaagtt 
aaggataagtatacgatacaactacatttgaaggaagcttatcaacctgcgttagcagaa 
ctagctatgccacgaccatacgtttttgattttcaactatag 

50 Sequence 2436 

MNKLTKLSTVIFVSGIILAGCGNNKELTEKKENKVLSYTTVKDIGDMNPHVYGGSMSAES 

MIYEPLVRNTKDGIKPLLAKKWDISPDGKTYTFHLRDDVSFHDGTKFDADAVKKNIDAVQ 
QNKKLHSWLRLSTLIDDVKVKDKYTIQLHLKEAYQPALAELAMPRPYVFDFQL* 

55 Sequence 24 37 

Contig_0757_pos_801_1223, 

putative peptide of unknown function 

atgagaaaatggttaaccttactattaattacaacattggtgttaactgcatgtggtaaa 
agtaacgaaaaagcttctttagaaaaaagcattgatcagttgaaaaaagaaaataaggat 
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ttaaaaaaacagaagaaaaagttacaagagcaaaaggataagcttaaacacaaacaggat 
agtctccaagaagatgtaaatgacttgcctgctaaaagcacatcccgagataagaaaaat 
aaagataatcatgatgcaaaagaaaagtcttcagataatcaatcgacatctgctaatcat 
gatgatcaaactaacaaaataaaaagcaatcaagatgaacatgacagtcaatcctctaaa 
5 ccacatacacagcagaagccctcacagaatgatagaaaaaataatcatcgacaagaacga 
tag 

Sequence 2438 

MRKWLTLLLITTLVLTACGKSNEKASLEKSIDQLKKENKDLKKQKKKLQEQKDKLKHKQD 
10 SLQEDVNDLPAKSTSRDKKNKDNHDAKEKSSDNQSTSANHDDQTNKIKSNQDEHDSQSSK 
PHTQQKPSQNDRKNNHRQER* 

Sequence 24 39 

Cont i g_0757_pos_4 58 1_4 994 , 

15 is similar to (with p-value 5.0e-18) 

>gp:gp|AF012906| AF012906_5 Bacillus subtilis yo j P gene, part 
ial cds; yojQ/S, yojR, yojT, yojU, yojV, yojw, yojX, yojY, y 
ojZ, and yokA genes, complete cds. NID; g2522404. >gp:gp|Z99 
114 1 BSUB0011_163 Bacillus subtilis complete genome (section 

20 11 of 21): from 2000171 to 2207900. NID: g2634230. >gp:gp|AF 
020713 1 AF020713_166 Bacteriophage SPBc2 complete genome. NID 
: g3025478. 

atgataggtacttatcaaagtgataaaaactttgaaatgatgaagacttttaagcattgg 
attcagactaatcattattggaaatatgttgagaaatacggtgtgttaggtatagcatta 
25 gataatcctctccacgttcaaagtaatcaatgtagatatgacgttgttttgagaatagat 
gaaacagtaaatgatcagacaatatctaaaagagattttacaggtggcatatatgctgtg 
tttaaagttagtcatacaaaaataaatatagagaagttctttagcaatttagaaaatatt 
ttaaatgaaagtcatttgcgtatgagaaatgaaccaattatagagagatacattgaagaa 
gagggaacagataaagtgtgtgaaatgttagtgcctatctatgaagtaaattaa 

30 

Sequence 2440 

MIGTYQSDKNFEMiyiKTFKHWIQTNHYWKYVEKyGVLGIALDNPLHVQSNQCRYDVVLRID 
ETVNDQTISKRDFTGGIYAVFKVSHTKINIEKFFSNLENILNESHLRMRNEPIIERVIEE 
EGT DKVCEMLVP I Y EVN * 

35 

Sequence 24 41 

Contig_0757_pos_6065__5280, 

is similar to (with p-value 6.0e-63) 

>gp:gp|U87792 |BSU87792_1 Bacillus subtilis tRNA-Ala, phospha 
40 tidylglycerophosphate synthase (pgsA) and CinA (cinA) genes, 
complete cds, and RecA (recA) gene, partial cds. NID: gl842 
434. 

atgatacttgtcgatgatatgtggttaaagtcaactaattttctcggttctcaatcagca 
ttcacatttaaagttgttatacagttaggttcagtatttgcggccgcctgggtatttaga 

45 gaacgcttcttagaaattttacatattggccaacataaacctgaaccttccacttcggga 
gacagacgttcaaaaccacqacgtctgaatttaatacatgtattagtaggtatggtccca 
gcagggattttaggatttttatttgatgatttaattgaaaaatacttatttagtgtacca 
acagtcttaattggtttatttataggtgccatttatatgattatagctgataagtattct 
aaaactgttcagcatcctcaaacagtagatcaaattaattatttccaagcatttgtcatt 

50 ggtatctctcaagcaatagctatgtggcctggatttagtagatccggttcaacgatttca 
acaggtgttcttatgaaattgaatcataaagctgcatctgatttcacttttattatgtcg 
gtaccaattatgttagctgcaagtggattatctttactaaaacat tatgagtatattcat 
tt;,gcacacataccattctacattttaggatttttagcggcatttattgttggattjatt 
gcaattaaaacattcttacacttaattaataaagttaagttagtaccttttgcta.':ttat 

55 agaattgtcttagttatttttatagcaatcctatacttcggattcggtattggcaaagga 
atttaa 

Sequence 24 42 

MILVDDMWLKSTNFLGSQSAFTFKVVIQLGSVFAAAWVFRERFLEILHIGQHKPEPSTSG 
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DRRSKPRRLNLIHVLVGMVPAGILGFLFDDLIEKYLFSVPTVLIGLFIGAIYMIIADKYS 
KTVQHPQTVDQINYFQAFVIGISQAIAMWPGFSRSGSTISTGVLMKLNHKAASDFTFIMS 
VPIMLAASGLSLLKHYEYIHLAHIPFYILGFLAAFIVGLIAIKTFLHLINKVKLVPFAIY 
RIVLVIFIAILYFGFGIGKGI* 

5 

Sequence 24 43 

Contig__0757_pos_3653_3204, 

putative peptide of unknown function 

atgaagactaacgtaattatagatggagatgcttgtcctgttgttaattctgtcattgaa 
10 ttgacgaaagggacaggcatttttgttacaattttaagaagttttagccatttttcacaa 
caaatacaacccgaacatgtaaaaattgtatacgttgatgacggtcccgatgcggtagac 
tataaaatagtcgaacttgctagcaataatgatatcgtcatcacacaagattatggactt 
gctagtctactgatagacaaagtacatactgtcatgcatcataaaggaaatatttatcac 
tcaaacaacatccaaagcttattaaatcaaagatatctaaatgctcaaataagaagacga 
15 ggtggtcgtcacaaaggccctcctcccttcacaacagaggatagacttaaattcgagcat 
gctttt agaaaaatcattaatcaaatatag 

Sequence 24 44 

MKTNVIIDGDACPVVNSVIELTKGTGIFVTILRSFSHFSQQIQPEHVKIVYVDDGPDAVD 
20 YKIVELASNNDIVITQDYGLASLLIDKVHTVMHHKGNIYHSNNIQSLLNQRYLNAQIRRR 
GGRHKGPPPFTTEDRLKFEHAFRKIINQI* 

Sequence 2445 
Contig_0758_pos_2562_34 88, 

25 is similar to (with p-value O.Oe+00) 

>gp:gpi AF0727261 AF072726_1 Staphylococcus aureus putative he 
me A synthase (ctaA) gene, complete cds . NID: g3320605. 
atgatggggtgtttatcattgtttagaaagcaaaaccttaaatggttaggtgttttagct 
acgattattatgacctttgtacaattaggtggcgccctcgtaactaaaacgggatcagaa 

30 gatggttgtggctcgtcttggcctttatgtaatggcgctttacttccagaaaatttacca 
atacaaacaattatagaactgagtcatcgcgcagtatcagccatttcacttatagttgta 
ttatggcttgtaattacagcttggaaaaacattggatatattaaagaaatcaaaccactc 
tctattattagtgttggttttttattagttcaagcacttgtaggtgctgctgctgtgata 
tggcaacaaaatccttatgtattagcgctacattttggtatttcacttatcagtttctct 

35 tctgttttcttaatgacattaattattttctcaattgacaaaaaatatgaagctgacatt 
ttatttattcacaaacctttacgtatcttaacttggttaatggctatcatcgtatactta 
actatttatacaggtgctttagttagacatactaaatcaagtcttgcttatggtgct tgg 
cctattccatttgatgatatcgttcctcataatgcgcatgattgggtacaattttcgcac 
agaggtatggcgctcatcacttttatctggattatgattacatttatacacgctattaag 

40 aattattcagataatcgaactgtacgttatggttatactgcatcatttatacttgttatc 
cttcaagttattacaggtgctttatcagtcataactaatgtcaatttaattattgcgtta 
ttccatgctttgtttatcacttacttattcggaatgattgcttattttattttactaatg 
ttaagaacgacgagaagtcaaaaataa 

45 Sequence 244 6 

MMGCLSLFRKQNLKWLGVLATIIMTFVQLGGALVTKTGSEDGCGSSWPLCNGALLPENLP 
IQTIIELSHRAVSAISLIVVLWLVITAWKNIGYIKEIKPLSIISVGFLLVQALVGAAAVI 
WQQNPYVLALHFGISLISFSSVFLMTLIIFSIDKKYEADILFIHKPLRILTWLMAIIVYL 
TIYTGALVRHTKSSLAYGAWPIPFDDIVPHNAHDWVQFSHRGMALITFIWIMITFIHAIK 

50 NYSDNRTVRYGYTASFILVILQVITGALSVITNVNLIIALFHALFITYLFGMIAYFILLM 
LRTTRSQK* 

Sequence 24 4 7 
Contig_0758_pos_7092_3637, 
55 is similar to (with p-value O.Oe+00) 

>gp;gp|D83706|D83706_l Bacillus stearothermophilus DNA for p 
yruvate carboxylase, complete cds. NID: gl695685. 
gtgtcttggcttttgaaacaaataaagaaattacttgttgctaaccgtggtgaaatcgcc 
attagaatttttagagcggcagcagaattaaatatcagtacagtagcaatttattctaat 
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gaagataaaagttcgttacatagatataaagcagatgaatcctatctagttggaagtgat 
ttaggacctgctgaaagttatttgaatatcgaacgtatcatcgaagtagctcttcgcgca 
ggtgtcgatgcaattcatcctgggtatggttttttaagtgaaaatgaacaatttgcacgc 
cgatgtgctgaggaaggcattaaatttataggtccgcatcttgaacatctagacatgttt 
5 ggagataaggttaaggctagaacaactgctattaacgctaacttacctgtaatcccgggt 
acagatggtcctattgaaagttttgaagctgcagaacagtttgctaatgaagcaggttac 
ccacttatgattaaggccacaagcggtggcggtggtaaaggtatgcgaatcgttcgtgaa 
tcaagcgaattagaagacgctttccatcgtgcgaaatcagaagccgaaaagtcatttggt 
aatagcgaagtttatatcgaaagatatattgataatccaaagcatatagaggttcaagtt 

10 attggtgatgaattcgggaatatcattcatttgtatgaaagagattgctccgtacaacga 
cgtcatcaaaaggttgttgaagttgcaccttcagtaggtctttctaacaaattaagagag 
cgaatttgtgatgccgcaattcaactgatggaaaatataaaatacgtcaacgctggaaca 
gtagaatttttagtttctggggatgaatttttcttcattgaggttaatccacgtgttcaa 
gttgagcatacaattactgaaatgattactggtatagacattgtgaaaacgcaaatttta 

15 gttgctgatggagaatcgttatttggagataaaatctctatgccacagcaaaatgaaatt 
caaacattagggtatgcgatacaatgtcgtataacaactgaagatcctactaatge tttt 
atgcoaqattctggcacaattattgcatatcgatcaagtggcggttttggtgtgaviactt 
gatgcaggggatggattccaaggtgcagaaatttcaccttactacgattcactattagtt 
aagctttctacacatgccgtttcatttaaacaagctgaagagaaaatggaacgttcatta 

20 cgcgaaatgcgaattcgtggcgtaaagacgaatattccatttctcatcaatgttatgcgt 
aatgataaatttagaagtggtgattatactactaaatttattgaagaaacacctgaactt 
ttcgatattgcaccgacattggacagaggtaccaagactttagagtatattggtaatgtg 
acgataaacggatttcctaatgtagaaaagcgtccaaaaccagaatatgaatctaccaaa 
atcccaaaaatttctcaaaagaaaatcaatcagttatttggaacaaaacaaattcttgag 

25 caacatggaccaacaggtgttacaaattgggttagagaacaagaagatgttttaattacc 
gatactacatttagagatgcacaccaatctttacttgcaacacgtgtaagaacaaaagat 
atgatgaacattgcatctaaaactgctgaagtttttaaagatagtttttcattagaaatg 
tggggtggtgcaacatttgatgtcgcctataatttcttgaaagagaatccatgggaacgt 
ttagaaagattgcgcaaagccattccgaatgtgttattccaaatgttattacgagcttcg 

30 aacgcagtaggttataaaaactatcctgataatgtaattaagaaattcgttcatgaaagt 
gcaaaagctggtgtagatgttttccgtatattcgactcattgaactgggttgatcaaatg 
aaagtagcgaatgaagctgttcaagaagctggaatggtatctgagggtacaatttgctat 
acaggtgatattttaaatgctgaacgttccaatatttatactttagattattacgttaaa 
atggctaaagaactggaaagagaaggattccatatattagcaattaaagatatggctggt 

35 ttattgaaaccgaaagcagcttacgaattaattggtgaattacgtgaggcaacacatctt 
ccaattcatttacatacacatgatactagtggaaatggattgttgacatataaacaagca 
attgatgctggcgtagatattatagatactgctgttgcatctatgagtggtttaacgagt 
caaccaagtgcaaattcattatattatgcactaaatggatttccacgtaatttaagaact 
gatattgatgggttagaagagttgagtcattactggtctgtagtcagaccttactatgca 

40 gactttgagagtgatatcaaatcaccaaatacagaaatttatcaacatgaaatgccaggt 
ggccaatattcaaacttaagtcaacaagctaaaagtttaggattgggcgaacgttttgat 
gaagtcaaagagatgtatcgtcgtgtcaacttcctgtttggagatcttgtaaaagtaaca 
ccatcttcaaaggtagttggagatatggcactatatatggtgcaaaatgatcttgatgaa 
gatacggtcatcaatgatggttataaattagatttcccagaatctgttgtgtcattcttt 

45 aaaggtgacattggacaacctgtcaacggattcaacaagaaattgcaagatgttatttta 
aaaggacagcaaccaatt act gaaagaccaggtgaa tact tggagccggtcgattttgaa 
gcaatccgtcaagaattaagcgacatacaacaagacgaggtaacagaacaagatataatt 
agttatgtactttatccgaaggtatataaacaatatattcaaacgaaagagcaatttggt 
aatgtatctttactggatacaccgacattcttatttggcatgcgtaatggtgaaacagtt 

50 gaaattgaaattgatactggtaaacgtctaattattaaattagaaacaatcagtgaacca 
gatgagaatggtaaacgtacaatttattacgctatgaatggtcaagcaagacgtatttat 
attcaagatgaaaatgttaaaacgaatgctaatgttaaacctaaggcggataaatcaaat 
ccaaatcatattggtgctcaaatgcctggttctgtaactgaagtcaaagtgtctgtaggc 
gatgaagttcaagctaatcagccattattaatcactgaagcaatgaagatggaaacgacg 

55 attcaggcaccatttgatggaattattaaacaaatcaatgttgctaatggagatgccatt 
gcoa.vaggagatttattagtggaaattgaaaagtaa 

Sequence 2448 

VSWLLKQIKKLLVANRGEIAIRIFRAAAELNISTVAIYSNEDKSSLHRYKADESYLVGSD 
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LGPAESYLNIERIIEVALRAGVDAIHPGYGFLSENEQFARRCAEEGIKFIGPHLEHLDMF 
GDKVKARTTAINANLPVIPGTDGPIESFEAAEQFANEAGYPLMIKATSGGGGKGMRIVRE 
SSELEDAFHRAKSEAEKSFGNSEVYIERYIDNPKHIEVQVIGDEFGNIIHLYERDCSVQR 
RHQKVVEVAPSVGLSNKLRERICDAAIQLMENIKYVNAGTVEFLVSGDEFFFIEVNPRVQ 
5 VEHTITEMITGIDIVKTQILVADGESLFGDKISMPQQNEIQTLGYAIQCRITTEDPTNDF 
MPDSGTIIAYRSSGGFGVRLDAGDGFQGAEISPYYDSLLVKLSTHAVSFKQAEEKMERSL 
REMRIRGVKTNIPFLINVMRNDKFRSGDYTTKFIEETPELFDIAPTLDRGTKTLEY.TGNV 
TINGFPKVEKRPKPEYESTKIPKISQKKINQLFGTKQILEQHGPTGVTNWVREQE.OVLIT 
DTTFRDAHQSLLATRVRTKDMMNIASKTAEVFKDSFSLEMWGGATFDVAYNFLKEWPWER 

10 LERLRKAIPNVLFQMLLRASNAVGYKNYPDNVIKKFVHESAKAGVDVFRIFDSLNWVDQM 
KVANEAVQEAGMVSEGTICYTGDILNAERSNIYTLDYYVKMAKELEREGFHILAIKDMAG 
LLKPKAAYELIGELREATHLPIHLHTHDTSGNGLLTYKQAIDAGVDIIDTAVASMSGLTS 
QPSANSLYYALNGFPRNLRTDIDGLEELSHYWSWRPYYADFESDIKSPNTEIYQHEMPG 
GQYSNLSQQAKSLGLGERFDEVKEMYRRVNFLFGDLVKVTPSSKVVGDMALYMVQNDLDE 

15 DTVINDGYKLDFPESVVSFFKGDIGQPVNGFNKKLQDVILKGQQPITERPGEYLEPVDFE 
AIRQELSDIQQDEVTEQDIISYVLYPKVYKQYIQTKEQFGNVSLLDTPTFLFGMRNGETV 
EIEIDTGKRLIIKLETISEPDENGKRTIYYAMNGQARRIYIQDENVKTNANVKPKADKSN 
P^3HIGAQMPGSVTEVKVSVGDEVQANQPLLITEAMKMETTIQAPFDGIIKQINVANGDAI 
ATGDLLVEIEK* 

20 

Sequence 24 4 9 

Contig_0758_pos_2303_1368, 

is similar to (with p-value 3.0e-59) 

>sp: spl P24009 |COXX_BACSU PROBABLE CYTOCHROME C OXIDASE ASSEM 
25 SLY FACTOR. >gp: gpl Z98682 I BS16823KB_3 Bacillus subtilis geno 
mic DNA 23.9kB fragment. NID: g2339988. >gp: gp |X54 14 0 | BSCTAB 
F__l B. subtilis ctaB-F genes for cytochrome a assembly fricto 
r and cytochrome-c oxidase (EC 1.9.3,1) subunits II, I, II, 
and IVB. NID: g994793. >gp : gp | Z99111 I BSUB0008_160 Bacillus s 
30 ubtilis complete genome (section 8 of 21); from 1394791 to 1 
603020. NID: g2633699. 

atgagaaatttaaggaggggaattatgaacaaagatcaaactttgtcacatactacgggc 
cgtgtatccttcaaagaattacaacaaattattaaaatgggccttgttcaaggtaattta 
atacctgcttttgcaggcgcatggcttgcaatagtaatgacaaaccattccttcctatct 

35 tccattccacaaatactattgatgctagttggctctacgcttattatggggggcgcttgt 
gctttaaataattattatgatcaagatattgatcgcattatgcctagtaagcaaagtaga 
ccaacagtaaatgatagaatatctgatagaaacttattaatgttaagttttgggatgatg 
ttaataggtgaagcatgtttattcttattaaatataccttctggtgttttaggattaatt 
ggtattgttggatatgtatcttactattcaatttggtctaagcgccatacaacttggaat 

40 actgttgttggaagttttcctggagctgtaccaccattaattggttgggtagctatcgat 
ggatcattaagtttagcagcagtagcactctttttagttgtcttttgttggcaacctatc 
catttctacgctctagcaattaaacgtagtgatgagtatgcgcttgcaaatattcctatg 
ttaccatcagtgaaaggtttcaaacggacaagagtaagcatgtttatttggttagtgtta 
ttattaccattgccattcttattatctaatttaggcgtaacttttgttgttattgctaca 

45 ctacttaatttaggatggttagctttaggttttacaacgttcagaaaagaatctaatcaa 
actaaatgggcaacgcaaatgttcgtttattcattgaactacttagtagtattctttgca 
ctcgttgtagttgtttcattaatcaagatgatataa 

Sequence 2450 

50 MRNLRRGIMNKDQTLSHTTGRVSFKELQQIIKMGLVQGNLIPAFAGAWLAIVMTNHSFLS 
SIPQILLMLVGSTLIMGGACALNNYYDQDIDRIMPSKQSRPTVNDRISDRNLLMLSFGMM 
LIGEACLFLLNIPSGVLGLIGIVGYVSYYSIWSKRHTTWNTWGSFPGAVPPLIGWVAID 
GSLSLAAVALFLVVFCWQPIHFYALAIKRSDEYALANIPMLPSVKGFKRTRVSMFIWLVL 
LLPLPFLLSNLGVTFVVIATLLNLGWLALGFTTFRKESNQTKWATQMFVYSLNYLVVFFA 

55 LVVVVSLIKMI* 

Sequence 2451 

Cent ig_07 58_pos_l 3 5 1_8 8 1 , 

putative peptide of unknown function 
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atgaaacttatgaacgtacccattttaccaacgataagtacatcgtgtattgttattagt 
gcgattttagtcgctattggttgggcattgatttggaaacgtcaagttcataagcataaa 
aacattatgctatgggctgcctttttcgctttaacattctttattatttatgcagcaaga 
actatttttatcggtaatacagctttcggtggaccaagttctattaaagtttattacact 
5 attttcttagttttccatatcattcttgcaacagttggtggcgttttaggcttaattcaa 
atcattttagccttcaaagataaacttcatattcacagaaaaattgggccttgggcttca 
ataatttggttctttaccgcaattactggtgttgcagtttatgtattgttatatgtattg 
tatccaggtggagaaacaacatcattgcttaaagctacattaggtctataa 

10 Sequence 2452 

MKLMNVPILPTISTSCIVISAILVAIGWALIWKRQVHKHKNIMLWAAFFALTFFIIYAAR 
TIFIGNTAFGGPSSIKVYYTIFLVFHIILATVGGVLGLIQIILAFKDKLHIHRKIGPWAS 
IIWFFTAITGVAVYVLLYVLYPGGETTSLLKATLGL* 

15 Sequence 2453 

Cont ig_07 5 9_pos_50 3 8_3 95 9 , 

is similar to (with p~vaiue 2.0e-29) 

>sp:sp|P33642| YFIT_PSEAE HYPOTHETICAL 39.5 KD OXI DOREDUCTASE 
IN FIMT 3'REGION (DADA*) (0RF2) . >gp: gp | L48 934 | PSEPILRV_2 P 
20 seudoraonas aeruginosa (isolate pRIC351) pilR gene, 3* end of 
cds, dada*, fimT, fimU and pi IV genes, complete cds . NID: g 

1161217. 

atgtcaattgctagacacctcagtgcaacacacttagatgttgcagtcatagatagagat 
gtacctggaaagcatgcgtcatataaagctggaggtatgcttggcgcacaaaatgaattt 

25 acagaggatagtgacttgtttcaattagccatcgaatctcgtgctatgtttccacaatta 
agtaaatcattattagatgaaacaggcatagacattcaatttaaaaattcaggacttatc 
aaaattgctaatgaacacgatgatatctcatctataaaacgacaatatcaatttctgaat 
agtcaagaccgtagtgtcaaacaattatcagatgatgatttgctacaacttacacatggt 
gaagttaaaccttcatacgcggccattcacataccacacgatggtcaaattaatgcacat 

30 cattacacactggcattattagaatcaatgaagttaagagatattaagcgttatgagtct 
acagaggtcacttcaatagaacggcataatggctattattcagtgaaaaccgatcaatct 
tcaacaattgaagcgcacaaaattatcgttgcaggtggcgcatggtcttcgcaattatta 
acacaatatcatctacaacgacaagtgattggcgttaaaggtgaagttatcttattagaa 
aataacgatctttcacttactgagacattatttatgactaatggttgttacatcgttcca 

35 aaacaacccaatcgttttttaattggtgcgacgagtgaatttaataattattctgtcggt 
actacagatgaaggtatggattggcttcttcgccatgcatatcatcgtgtacctcaacta 
aaagacagtcatatactgaagaaatggtcaggagtaagaccatacacagaaaaagaaatg 
ccagtcatggatcaaattgatgatggcttatacgtgataagtggtcattatcgaaacgga 
atat cati.gtcacctattatcggtcgtgacattgccaattggctactttctggta^ taaa 

40 ccatcacgttattcaagttttacagttacaaggaggaataatcatgaagtgtatcattaa 



Sequence 2454 

MSIARHLSATHLDVAVIDRDVPGKHASYKAGGMLGAQNEFTEDSDLFQLAIESRAMFPQL 
45 SKSLLDETGIDIQFKNSGLIKIANEHDDISSIKRQYQFLNSQDRSVKQLSDDDLLQLTHG 
EVKPSYAAIHIPHDGQINAHHYTLALLESMKLRDIKRYESTEVTSIERHNGYYSVKTDQS 
STIEAHKIIVAGGAWSSQLLTQYHLQRQVIGVKGEVILLENNDLSLTETLFMTNGCYIVP 
KQPNRFLIGATSEFNNYSVGTTDEGMDWLLRHAYHRVPQLKDSHILKKWSGVRPYTEKEM 
PVMDQIDDGLYVISGHYRNGILLSPIIGRDIANWLLSGIKPSRYSSFTVTRRNNHEVYH* 

50 

Sequence 24 55 
Contig0759_pos_3004_2006, 
is similar to (with p-value 2.0e-52) 
55 >gp:gp|AF012285|AF012285_3 Bacillus subtilis mobA-nprE gene 
region. NID: g3282109. >gp:gp| Z99111I BSUB0008_99 Bacillus su 
btilis complete genome (section 8 of 21) : from 1394791 to 16 
03O20. NID: g2633699. 

atgtcgcgttatgaacgtcaaacacgctttgcaccatttggagaagagggtcagc^aaag 
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ctatcctcctctcaaatacttatttttggcgctggtgctttaggaagccatattgtagat 
caactcgcacgcatgggggctcatcatattgcaatcgtcgatatggatattgttgaaatt 
tcaaatttacatcgacaaacactcttcgatgaagaagacgcacatactttaatatccaaa 
gttgaagcaatcaagcataaggttaatcaaattaatataaatgtcaatctaacaacttat 
5 gatttagaagttacttcatcaaatatcgaaaatttgataaaaaatgtcgaaccagacatc 
atcattgatggcatggataacttcaaaatacgatacctgattaatgaggtttgtcacaag 
tatcaaatcccatgggtttatggtgcagctgttggtagtaaaggatcagtatatggaata 
gatcaccaaggaccatgtctaaaatgtttattgcaaacaattcctgacacaggggaaagt 
tgcgctattaatggcgtaattccccctgttatatcaatgattgcaagctatgaagtagca 

10 gacgccgtacgttatctttcaggaaaaggattttcaaagcaattaatcactattgatgca 
tttaatatcaattataagtcaatgaatgtagatgcactcaaaaataaagattgcccagtg 
tgtgaaaaacatgaatatacgttactagaaagccaacaagaacgtactattgaggacttg 
tgtgggaatgcttatttatttagattcccacctaaagcttttaaacacgctgcccatttc 
cctgggaatatggtgaaatctacttcctttgccaaattaattcaatatcaaacttatgaa 

15 ttcaccttgtttaaagatggtcgtatgaatgcatatggtatacacaatgatgaagaagca 
catcacctatacaatacgttgttaaaatccatacgctaa 

Sequence 2456 

MSRYERQTRFAPFGEEGQQKLSSSQILIFGAGALGSHIVDQLARMGAHHIAIVDMDIVEI 
20 SNLHRQTLFDEEDAHTLISKVEAIKHKVNQININVNLTTYDLEVTSSNIENLIKNVEPDI 
IIDGMDNFKIRYLINEVCHKYQIPWVYGAAVGSKGSVYGIDHQGPCLKCLLQTIPDTGES 
CAINGVIPPVISMIASYEVAEAVRYLSGKGFSKQLITIDAFNINYKSMNVDALKNKDCPV 
CEKHEYTLLESQQERTIEDLCGNAYLFRFPPKAFKHAAHFPGNMVKSTSFAKLIQYQTYE 
FTLFKDGRMNAYGIHNDEEAHHLYNTLLKSIR* 

25 

Sequence 2457 

Cont ig_07 5 9_pos_l 8 98_1 2 8 4 , 

is similar to {with p-value 4.0e-38) 

>sp:splQ4 8 630|APL_LACLA ALKALINE PHOSPHATASE LIKE PROTEIN. > 
30 pi:;:pir IS39339IS39339 alkaline phosphatase-like protein - La 
ctococcus lactis >gp : gp I Z29065 I LLALPHLP_2 L,lactis (MGl.^,63) 
apl gene for alkaline phosphatase like protein, NID: g435295 

atggaacaaattatcactgattttattagtaagtggggttatacagcgatattcatttta 
35 atcttattagagaacgtattacctgtcgttccatctgagattattttaacttttgcaggc 
ttattatctgtgaaatcacacttatctatttggacattattaatcatagcaacaattgct 
tcattcattggtttactcattttgtattatatttgtagacttatctcagaagagaaatta 
tatcgtttcgttgatcgacatggtaagtggatgaagttaaaaagtaaagatttgaaacgg 
gcaaatgattggtttaaaaagtatggtgcgtgggctgtatttttatgtcgttttgtccca 
40 gtacttcgagtattaattacaatacctgctggcattaatcgaatgaacgttatacagttt 
acaactttatctttaataggtactacaatttggaattttgctttaatactgctcggtcgt 
ttgctcagtgacagttttgacgctttgatgaatggtattcatacatattcacgtatcatg 
tatgtcattattattattgcagtcatatattttgttatacgttatttaatgaaacgtcgt 
cggagtgttaaataa 

45 

Sequence 2458 

MEQIITDFISKWGYTAIFILILLENVLPVVPSEIILTFAGLLSVKSHLSIWTLLIIATIA 
SFIGLLILYYICRLISEEKLYRFVDRHGKWMKLKSKDLKRANDWFKKYGAWAVFLCRFVP 
VLRVLITIPAGINRMNVIQFTTLSLIGTTIWNFALILLGRLLSDSFDALMNGIHTYSRIM 
50 YV :il I AVI Y FV I RYLMKRRRS VK* 

Sequence 2459 
Cont ig_07 5 9_pos_l 08 6_1 5 1 , 
is similar to (with p-value O.Oe+00) 
55 >gp:gp| D781931BACGNTZA_11 Bacillus subtilis 36kb sequence be 
tween gntZ and trnY genes encoding 34 ORFs . NID: gl064780. > 
gp:gp|Z99124 |BSUB0021_124 Bacillus subtilis complete genome 
(section 21 of 21): from 3999281 to 4214814. NID: g2636442. 
atgaaacaaaaatatctagatttactctcacaaaaatttgacagtgcagaaaaacttgct 
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actgaaattattaacttagagtcaatcttagaattacctaaagggactgaacattttgtt 
ag'.gacc^tcatggtgaatacgaatctttccaacatgttttaagaaacggatctgoaaat 
gtgcgtCjCtaaaattaatgatatcttcaaagataaattatcccagcaagaaatcaacgac 
ttagcagcattagtatactatccggaagaaaaactaaaattagttaaaaataatttcgat 
5 tcaatcggaacattaaatatttggtatattacaaccattcaacgattaattgatttaatt 
acatattgctcatcaaaatatacacgttcaaaattacgcaaagcattacctgaacaatac 
gtttatattattgaagagctactttacaagagcaatgaatttcataataaaaagccttat 
tatgaaacattagttaaccaaattattgaattagaacaatcagatgatttaatcattggc 
ctttcctatactgtacaacgtctagtcgtagaccatcttcatgtcgtgggcgatatctat 

10 gaccgtggtcctaaacctgataagattatggatacattaataaattatcattctgtagat 
atccaatggggaaatcatgatgtattatggattggcgcctatgctggttcaaaagtatgt 
cttgctaaccttctacgtatctgtgcacgttatgataatttagatattattgaagatgca 
tatggcatcaatctacgccctttacttacgcttgctgaaaagtattacgatgctgaaaac 
ccagcgtttaaacctaagaaacgaccagataaagacgtcagtcttacaaaacgcgagaaa 

15 gtcaaatcacaaaaattcatcaagcaattgcgatga 

Sequence 24 60 

MKQKYLDLLSQKFDSAEKLATEIINLESILELPKGTEHFVSDLHGEYESFQHVLRNGSGN 
VRAKINDIFKDKLSQQEINDLAALVYYPEEKLKLVKNNFDSIGTLNIWYITTIQRLIDLI 
20 TYCSSKYTRSKLRKALPEQYVYIIEELLYKSNEFHNKKPYYETLVNQIIELEQSDDLIIG 
LSYTVQRLVVDHLHVVGDIYDRGPKPDKIMDTLINYHSVDIQWGNHDVLWIGAYAGSKVC 
LANLLRICARYDNLDIIEDAYGINLRPLLTLAEKYYDAENPAFKPKKRPDKDVSLTKREK 
VKSQKFIKQLR* 

25 Sequence 24 61 

Cont i g_0 7 6 2_po s_5 747_4368, 

is similar to (with p-value O.Oe+00) 

>sp:sp| P25811 |THDF_BACSU POSSIBLE THIOPHENE AND FURAN OXIDAT 
ION PROTEIN THDF. >pir :pir I JQ1215 | JQ1215 hypothetical 50K pr 

30 otein - Bacillus subtilis >gp : gp I D26185 1 BAC180K_60 B. subtil 
is DNA, 180 kilobase region of replication origin. NID: g467 
326. >gp:gp|X625391 BS0RIGS_5 B. subtilis genes rpraH, rnpA, 50 
kd, gidA and gidB, NID: g40020. >gp : gp | Z99124 | BSUB002 1_207 B 
acillus subtilis complete genome (section 21 of 21) : from 39 

35 99281 to 4214814. NID: g2636442. 

atggattttgatacgattacaagtatttcaacaccgatgggtgaaggtgctattggaatt 
gtgagattatctgggccacaagctattgaaatcggagatatcttatataaaggtaagaaa 
aagttatctqaaqttgagacgcatacaataaattacggtcatattattgatccagaaaca 
aatgaaacagttgaagaagtcatggtgtctgtattacgtgcccctaaaactttcacacga 

40 gaagatattattgagataaattgtcatggtggtattttaacaattaatcgtatattagag 
ttfiactatgacttatggtgcacgtatggcagaaccaggtgaatatacaaaacgtgcattt 
ttaaatqgtcgtatagatttatctcaagcagaagcggttatggattttatacgtt..:caaa 
actgatcgagcttctaaggttgcgatgaatcaaatagaaggacgtttaagtgacttaatc 
aagaaacaacgtcaatccatattagagatactcgcccaagttgaagttaacattgattat 

45 ccagagtatgatgatgtagaagacgcaacgacggacttcttactagaacagtctaagcgt 
attaaagaagaa.atcaatcagttacttgaaacaggagcacaaggtaaaataatgagagaa 
gggttatctacagttattgtaggacgtcctaatgttgggaagtcttcgatgctaaataac 
cttattcaagataataaagcaattgtgactgaggtcgctggtacaacaagagacgtgtta 
gaagaatatgtcaatgttagaggtgtcccgttacgacttgtagatactgcgggtattagg 

50 gatactgaagatatcgtagagaagattggtgtagaacgttctaggaaagctttaagtgaa 
gcagatttaattttatttgtgcttaataacaatgaacctctgacggaagatgatcaaact 
ttattcgaagtcattaaaaatgaggatgttattgtaatcattaataaaacagatttagaa 
cagcgattagatgttagcgaactaagagagatgattggtgatatgccacttatacaaaca 
tcgatgcttaaacaagaaggtattgatgaattagaaatacaaattaaagatttattcttt 

55 ggtggcgaagtacaaaatcaagatatgacttatgtatctaattcacgtcacatttcattg 
ttgaaacaagcgagacaatcaattcaagatgcgattgatgctgctgagtctggtatccca 
atggatatggtacagattgatttaacacgtacttgggaaattctaggagaaattattgga 
gaatcagcgagtgatgaattaatagatcaactatttagtcaattttgtttaggaaaataa 
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Sequence 2462 

MDFDTITSISTPMGEGAIGIVRLSGPQAIEIGDILYKGKKKLSEVETHTINYGHIIDPET 
NETVEEVMVSVLRAPKTFTREDIIEINCHGGILTINRILELTMTYGARMAEPGEYTKRAF 
5 LNGRIDLSQAEAVMDFIRSKTDRASKVAMNQIEGRLSDLIKKQRQSILEILAQVEVNIDY 
PEYDDVEDATTDFLLEQSKRIKEEINQLLETGAQGKIMREGLSTVIVGRPNVGKSSMLNN 
LIQDNKAIVTEVAGTTRDVLEEYVNVRGVPLRLVDTAGIRDTEDIVEKIGVERSRKALSE 
ADLILFVLNNNEPLTEDDQTLFEVIKNEDVIVIINKTDLEQRLDVSELREMIGDMPLIQT 
SMLKQEGIDELEIQIKDLFFGGEVQNQDMTYVSNSRHISLLKQARQSIQDAIDAAESGIP 
10 MDMVQIDLTRTWEILGEIIGESASDELIDQLFSQFCLGK* 

Sequence 24 63 

Conti.g_ 0762_pos_4295_2418, 

is simj lar to (with p-value O.Oe+00) 
15 >sp:sp| P258121GIDA_BACSU GLUCOSE INHIBITED DIVISION PROTEIN 

A. >pir :pir [JQ1216 I BWBSGA gidA protein - Bacillus subtilis > 

gpigpl D26185 |BAC180K_59 B. subtilis DNA, 180 kilobase region 
Of replication origin. NID: g467326. >gp :gp 1X62539 1 BSORIGS_ 

6 B. subtilis genes rpmH, rnpA, 50kd, gidA and gidB. NID: g40 
20 020. >gp:gp| Z99124 1 BSUB0021_206 Bacillus subtilis complete g 

enome (section 21 of 21): from 3999281 to 4214814. NID: g263 

6442. 

gtggttcaagaatatgatgtag tag teat tggtgctggtcacgccggtattgaagcaggt 
ctagcttcagctcgccggggtgctaaaacactgatgttaacaattaatttagataatatt 

25 gctttcatgccatgtaatccatctgtaggtggtcctgcgaaaggaatcgttgtacgtgaa 
atagacgctttaggtggacaaatggcaaaaactattgataaaactcacattcaaatgcgt 
atgcttaatacaggtaaaggtccagctgttagagctttacgtgctcaagcagataaagta 
ttatatcaacaagaaatgaagcgtgtacttgagaatgaggataatttagacatcatgcaa 
ggtatggttgatgaactcattatagaagataatgaagttaaaggtgttcgtactaatatt 

30 ggtacagaatatcgttctaaagctgtcattattacaacaggtacattcttacgtggagaa 
attatactaggaaacttaaaatattctagtggccctaaccatcaattaccatctgtaact 
ctagcggataatttaagaaaattaggatttgatatcgttagatttaaaacgggtac-ncca 
ccaci.vtqiaaatgcgagaaccatcgattattctaaaactgaaatccaaccaggtg.jcgat 
ataggtcgagcgtttagttttgaaacaaccgaatttattttagatcaattaccttgttgg 

35 ttaacttatacaaatggagatacacatcaagtcattgatgataacttacatttatctgct 
atgtattccggtatgattaaaggtacaggtcctagatattgtccatcaattgaggataaa 
tttgtccgctttaacgataaaccaagacatcaacttttcttagaacctgaaggacgtaat 
acgaatgaggtatacgtgcaaggattatctactagtttacctgaacatgttcaacgtcaa 
atgttagaaaccattccaggtcttgaaaaagcagatatgatgcgtgcgggttatgctatt 

40 gaatatgatgcaatcgtgcctactcaattatggccaacgttagaaacaaaagcgattaaa 
aacttgtatactgcaggtcagattaatggaacatcaggatatgaagaagcagcgggacaa 
ggaatcatggcaggtattaacgctgctggtaatgttttaggtacaggtgaaaaaatactc 
agccgttcagacgcatatattggtgtacttatagatgatttagtcactaagggtacaaat 
gaaccgtatcgattattaacttcacgtgcggaatatcgattattactacgtcatgataat 

45 gctgatttacgtcttactgatatggggtatgaattaggtttaatatcagaagaacgctat 
gcaagatttaatgaaaagcgtcaacaaatcaaagatgaaatacaacgacttaccgatgta 
cgtattaaaccaaatgaacatacgcaagcaattattgaagctaagggtggttcaagatta 
aaagatggcatattagcgattgatttattacgtcgtcccgaaatgaactacgaaacaatt 
ttagaaatct tagaagaatcacatcaacttcctgaagcggttgaggaacaagttgaaatt 

50 caaacaaaatatgaaggttatatcaataaatctttacaacaagtagaaaaagttaaaaga 
atggaagcgaaaaaaattcctgaggatttagattatagcaaggtagatagtttagcatct 
gaagcacgcgaaaagttagctgaagttaaaccattaaatattgcacaggcttcaca.'iatt 
tcagyCQ'-gaatccagcagatatctcaattctacttgtttatttagaacaaggta.'actt 
caaagggtgaaacaataa 

55 

Sequence 24 64 

VVQEYDVVVIGAGHAGIEAGLASARRGAKTLMLTINLDNIAFMPCNPSVGGPAKGIVVRE 
IDALGGQMAKTIDKTHIQMRMLNTGKGPAVRALRAQADKVLYQQEMKRVLENEDNLDIMQ 
GMVDELIIEDNEVKGVRTNIGTEYRSKAVIITTGTFLRGEIILGNLKYSSGPNHQLPSVT 
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LADNLRKLGFDIVRFKTGTPPRVNARTIDYSKTEIQPGDDIGRAPSFETTEFILDQLPCW 
LTYTNGDTHQVIDDNLHLSAMYSGMIKGTGPRYCPSIEDKFVRFNDKPRHQLFLEPEGRN 
TNEVYVQGLSTSLPEHVQRQMLETIPGLEE<ADMMRAGYAIEYDAIVPTQLWPTLETKAIK 
NLyXAGQINGTSGYEEAAGQGIMAGINAAGNVLGTGEKILSRSDAYIGVLIDDLVTKGTN 
5 EP7RLLTSRAEYRLLLRHDNADLRLTDMGYELGLISEERyARFNEKRQQIKDEIQ?XTDV 
RIKPNEHTQAIIEAKGGSRLKDGILAIDLLRRPEMNYETILEILEESHQLPEAVEEQVEI 
QTKYEGYINKSLQQVEPCVKRMEAKKIPEDLDYSKVDSLASEAREKLAEVKPLNIAQASRI 
SGVNPADISILLVYLEQGKLQRVKQ* 

10 Sequence 2465 

Con t i g_0 7 6 2_pos_l 4 2 4 _8 1 6 , 

is similar to (with p-value l.Oe-40) 

>sp:sp|P37524 I YYAA_BACSU HYPOTHETICAL 32,8 KD PROTEIN IN SPO 
OJ-GIDB INTERGENIC REGION. >pir : pir | S18078 | S18078 hypothetic 

15 al protein 3 - Bacillus subtilis >gp : gp | D26185 I BAC180K_57 B. 
subtilis DNA, 180 kilobase region of replication origin. NI 
D: g467326. >gp : gp I X62539 I BS0RIGS_8 B. subtilis genes rpmH, r 
npA, 50kd, gidA and gidB. NID: g40020. >gp: gp I Z99124 I BSUB002 
I_204 Bacillus subtilis complete genome (section 21 of 21): 

20 from 3999281 to 4214814. NID: g2636442. 

atgttcgaaattatagccggcgaacgacgatttcgagcattacagtcgttgcataaacct 
caagtagatgtcattgttcgagatatggatgatgaagaaacagcggtagttgcattgatt 
gaaaatattcaacgtgaaaacttatctgtcgtcgaagaagcggaagcttataaaaagtta 
cttgaaatcgggggaacgactcaaaatgaattagcaaaaagtctaggcaagagccaaagc 

25 ttcattgctaataaacttagattattgaagttagcacccaatgtgattaagagatY.acgt 
gaaggtaagattacagaaagacatgcacgagcggtattagtattacctgatgaaacacaa 
gaagaattaatcgagcaagttattagtcagaagttgaatgtgaaacaaactgaggataga 
gtacgtcagaaaactggaccagagaaagtaaaagcgcagactttccaattttctcaagat 
gtaacacaagcaaaagaagaactaggtaagagtattgaaacgatagaaaaatcaggtata 

30 cgcgttgaacaaaaagataaagaacatgaagattattatgaaattaaaataaagatatat 
aagaaataa 

Sequence 2466 

MFEIIAGERRFRALQSLHKPQVDVIVRDMDDEETAWALIENIQRENLSWEEAEAYKKL 
35 LEIGGTTQNELAKSLGKSQSFIANKLRLLKLAPNVIKRLREGKITERHARAVLVLPDETQ 
EELIEQVISQKLNVKQTEDRVRQKTGPEKVKAQTFQFSQDVTQAKEELGKSIETIEKSGI 
RVEQKDKEHEDYYEIKIKIYKK* 

Sequence 24 67 
40 Contig_0763_pos_624__1304, 

is similar to (with p-value 2.0e-26) 

>gp:gp|U76260 I PAU76260_1 Peptostreptococcus asaccharolyticus 
alpha- and beta-subunits of L-serine dehydratase (sdhB) and 
(sdhA) genes, complete cds . NID: g2315864. 

45 atggctaaaagctatgattatcaaagtgctttcgatattattggaccagtaatgacggga 
ccttcaagttctcatacagcaggtgcagtaaaaattggtaattcagcgagagctgtgtta 
ggagatatgcctaagcatatagaaattcgttattatgaatcttttgctaaaacgcatcaa 
gggcatggtacagacgttgctattgtcggaggtgctatgggctacagcactttcgatagt 
agaattaaatcatccttagacatagcaaaagatgaaaatattacaattgatattattgaa 

50 gatgaaggagaaagtattggtcaacatcctaactgtgcttatatcaaagcaaatacgaaa 
gacggacgttatatagaagtgataggtatttctattggtggcggtacaatcaaactaaaa 
ggtatcaatgtaaatggtttaaatgtggaactgaatcatgggcttccaatgttagttata 
gatggaaatatgaataaagctaaaataaatcatcttattaatgatttatcagatatggac 
ttagacttaggtgaagaattaatagaaacaaatgataatgaaggtttagttgtatttcct 

55 ttaaataaagcaatctcagaatcagcattaaatattattaaagataaacatagtgattta 
aacgtttcctatatcaaatag 

Sequence 2468 

MAKSYDYQSAFDIIGPVMMGPSSSHTAGAVKIGNSARAVLGDMPKHIEIRYYESFAKTHQ 
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GHGTDVAIVGGAMGYSTFDSRIKSSLDIAKDENITIDIIEDEGESIGQHPNCAYIKANTK 
DGRYIEVIGISIGGGTIECLKGINVNGLNVELNHGLPMLVIDGNMNKAKINHLINDLSDMD 
LDLGEELIETNDNEGLVVFPLNKAISESALNIIKDKHSDLNVSYIK* 

5 Sequence 24 69 

Con t ig_07 6 3_pos_3 8 3 8__4 44 9, 

putative peptide of unknown function 

atgaagcagtggatgaatagattaatcaccttaataggcgtattgttaatcattttagct 
atttatttattctcaaagccatatatcgataattatctacatgaaaaagataacgatcat 
10 aaaattgaaaattatgataaaaaggaaaaagaacagacaaagacatctaaatcgacgcca 
aagataccttccgataaatctaaaatggctggttatatagaagttccagatgcacaaata 
aaagaaccagtataccctggtccagcaacaccagaacaactcaatagaggtgttagtttt 
gcagaaggtgacgaatctcttaatcaacagaatatttcaattgctggtcatacgtttaca 
gatcgttcgcactatcaatttacaaatttaaaatcagccaaaatcggtagtaaagtgtat 
15 tttaaaactggaaatcaaactagaaagtataaaataactaaaatacgtgatgttaagcct 
ac;igigqttaaggtattagacgaacatcctaataagaaaaatcaattaacattaa^.cact 
tgcgatcjactataacgaagaaacgggtgtttgggaaacaaggaaaatattcatagctaca 
caaattaactaa 

Sequence 2470 

MKQWMNRLITLIGVLLIILAIYLFSKPYIDNYLHEKDNDHKIENYDKKEKEQTKTSKSTP 
KIPSDKSKMAGYIEVPDAQIKEPVYPGPATPEQLNRGVSFAEGDESLNQQNISIAGHTFT 
DRSHYQFTNLKSAKIGSKVYFKTGNQTRKYKITKIRDVKPTEVICVLDEHPNKKNQLTLIT 
CDDYNEETGVWETRKIFIATQIN* 

Sequence 2471 
Contig_0763_pos_4 794_5603, 
putative peptide of unknown function 

atgattaaagccattgcggtagatatggatggaacatttcttgacacaaataaacagttt 
gatcgaaatcgttttgaaactatttttaaagaattaatagataaaaatattaagtttata 
gctgcgagtgggaatcaatttgcaaagctaaaatcaatttttggagatagggaaatgttc 
tttatatctgaaaatggagcagtcatctataaaggtaatcaactttacaattatcgaagt 
tttgatcagtatatttttcaaaaagttgtaaattatttaaatttgaatcaaaagataaac 
aatttgattatttgtggtgtaaaaagtgcatatattttaaaagaaacaagcgaagcattt 
aagcaagatgcacgtacatattatcaccaactaatagaggttgactccttacaaacatta 
cctgatgcwtgattatgtgaaaattgctttcaatataaatcgtcagactcatccagactta 
gatgagaaattagctcttaagtttaaagacgatattaaactagtatcaagtgggagagat 
agtatagatgttattatgccaaatatgactaagggtcaagctttgtctagattattaaaa 
gaatggcaaatgcctgcttcacatttaatggcatttggagatgcaaataacgataaagat 
atgttggagcttgccgaacatagttatgttatggctaatagtgaagatcaatcattattt 
aatatagcgagtcatgtggcaccttccaatgatgaacaaggcgtactatcaacaatcgaa 
aatgttgttctcggttattccaataaataa 

Sequence 2472 

45 MIKAIAVDMDGTFLDTNKQFDRNRFETIFKELIDKNIKFIAASGNQFAKLKSIFGDREMF 
FISENGAVIYKGNQLYNYRSFDQYIFQKWNYLNLNQKINNLIICGVKSAYILKETSEAF 
KQDARTYYHQLIEVDSLQTLPDDDYVKIAFNINRQTHPDLDEKLALKFKDDIKLVSSGRD 
SIDVIMPNMTKGQALSRLLKEWQMPASHLMAFGDANNDKDMLELAEHSYVMANSEDQSLF 
NIASHVAPSNDEQGVLSTIENVVLGYSNK* 

50 

Sequence 24 73 

Con t i g_0 7 6 3_pos_7 0 2 3_7 6 9 1 , 

is similar to {with p-value 2.0e-20) 

>gp:gp| AF012552|AF012552_2 Helicobacter pylori prolipoprotei 
55 n diacylglycerol transferase (Igt) and NADPH-linked flavin n 
itroreductase (rdxA) genes, complete cds. NID: g2564440. 
atgattatgaatcagatgaatcaaacgattattgatgcattccattttagacatgcgaca 
aaagaatttgaccctacgaaaaaaattagtgatgaagattttaatacgattttagaaaca 
ggtagattatctccaagttcactaggtttagaaccttggcactttgtagtggttcaaaat 



25 



30 



35 
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aaagaattgagagaaaaattgaaagcctatagttggggagcacaaaagcaacttgataca 
gcaagtcactttgtattaatttttgctcgtaagaatgtgacggctcatacagattacgtg 
caacatttacttcgtggcgtcaaaaaatatgaagaaagtacaattccagcagttgaaaat 
aaatttgatgatttccaagaaagtttccatattgccgataatgaacgaacattatatgac 
5 tgggcgagtaaacaaacatatattgcattagcaaacatgatgacaagtgctgcattacta 
ggtatcgactcatgtccaattgaaggatttgatttagataaagtgactgaaattctttca 
gatgagggtgttttagatacggaacaatttggtatttcagttatggtaggctttggttac 
agagcacaagaacctaaacatggcaaagttagacaaaacgaagacgacatcattagttgg 
attgaataa 

io 

Sequence 24 74 

MIMNQMNQTIIDAFHFRHATKEFDPTKKISDEDFNTILETGRLSPSSLGLEPWHFVVVQN 
KELREKLKAYSWGAQKQLDTASHFVLI FARKNVTAHTDYVQHLLRGVKKYEESTI PAVEN 
KFDDFQESFHIADNERTLYDWASKQTYIALANMMTST^ALLGIDSCPIEGFDLDKVTEILS 
1 5 DEGVLDTEQFG I S VMVGFGYRAQEPKHGKVRQNEDDI I SWI E* 

Sequence 2475 
Contig_07 63_pos_6789_57 97, 
is similar to (with p-value O.Oe+00) 
20 >gp:gp| U31175 ISAU31175_1 Staphylococcus aureus D-specific D- 
2 -hydroxy acid dehydrogenase (ddh) gene, complete cds. NID: g 
1644432. 

atgacaaaaattatgtttttcggcacaagagcatatgagaaggacatggcattacgttgg 
ggaaagaaaaataatatcgatgtcactacatcaacagaacttttaagtgtagatactgtc 

25 gatcaattaaaagattatgacggtgttacaacaatgcagttcggtaaattagaacctgaa 
gtttaccctaaattagagtcctatggtattaaacaaattgcacaacgtacggctggatt t 
gatB^gtatgacttagaacttgcaaaaaaacatgaaattattatctcgaatatacr/^agt 
tattcacctgaaacaattgctgaatattcggtatctatcgctctgcaactcgtaCjaaaa 
ttcccaacaattgaaaaacgtgtgcaagcacataatttcacatgggcgtcccctattatg 

30 tctcgtccagtaaaaaatatgactgtagcaatcatcggtacagggcgtat tggtgctgca 
actggtaaaatctatgctggttttggtgcgagagtagttggttatgatgcatatcctaat 
cattctttatctttct tagaatataaagaaacagtagaggatgcaattaaagatgctgat 
attatctcattacatgtacccgctaataaagatagtttccatttatttgataacaatatg 
tttaaaaatgttaaaaaaggtgccgttttagtcaatgccgcaagaggagctgtgataaac 

35 acgcctgatttaattgaagcagtaaataatggtacattatcaggtgctgccattgacaca 
tatgaaaatgaagctaattatttcacatttgattgttcaaatcaaacgattgacgaccca 
atattattagacctaattagaaatgaaaatattttagttacacctcatattgcctttttc 
tccgatgaagcagtacaaaatttagtagagggtggtttgaatgcagcattatcagtaatt 
aatactggcacatgtgatacgcgattaaactaa 

40 

Sequence 24 76 

MTKIMFFGTRAYEKDMALRWGKKNNIDVTTSTELLSVDTVDQLKDYDGVTTMQFGKLEPE 
VYPKLESYGIKQIAQRTAGFDMYDLELAKKHEIIISNIPSYSPETIAEYSVSIALQLVRK 
FPTIEKRVQAHNFTWASPIMSRPVKNMTVAIIGTGRIGAATGKIYAGFGARWGYDAYPN 
45 HSLSFLEYKETVEDAIKDADIISLHVPANKDSFHLFDNNMFKNVKKGAVLVNAARGAVIN 
TPDLIEAVNNGTLSGAAIDTYENEANYFTFDCSNQTIDDPILLDLIRNENILVTPHIAFF 
S DE AVQNI. VEGGLN AALS V I NTGTC DTRLN * 

Sequence 24 77 
50 Contig_0763_pos_3669_3175, 

is similar to (with p-value 3.0e-17) 

>pir :pir I JT0409 I JT0409 phosphinothricin-N-acetyltransf erase 
- Streptomyces viridochromogenes >gp : gp | M22827 | STMPAT_2 Stre 
ptorayces viridochromogenes phosphinothricin N-acetyltransf er 
55 ase (pat) gene, complete cds. NID: g295177. >gp: gp | X65195 | SV 
PTT__3 S. viridochromogenes genes pms, phsA, pat and dea for p 
hosphinomethylmalic-acid-synthase, phosphinothricin- tripepti 
de-synthetase A, phosphinothricin-N-acetyltransferase and N- 
acetylphosphinothricin-tripeptide deacetylase, respectively. 
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NID: g47997. >gp:gp| A02774 | A02774_l Artificial phosphinothr 
icin resistance gene. NID: g345279. >gp: gp I A02804 I A02804_l S 
. viridochromogenes phosphinothricin resistance gene. NID: g3 
45154. >gp:gp|A29201|A29201_l Synthetic DNA for phosphinothr 
5 icin resistance gene (viral/herbicide resistance) from paten 
t W09111517. NID: gl248925. 

atgattagatttgcacgactagaagatcttcaagatattttgacaatttataatgatgcc 
atccttaatacaacagctgtttatacgtataagccacaacaattagatgaacgtcttcaa 
tggtatcaatctaaagcaaaaataaacgaacctatatgggtttatgaaaaagaagggaaa 

10 gtagttggttttgccacttatggttcctttagacaatggccggcctatttatatactatt 
gaacattctatatatgttcatcaacagtacagaggactaggtatcgcttctcaattatta 
gagaatttaattcgttacgctaaagaacaaggttatcgcaccattgttgctgggattgat 
gcatcgaacatggatagtatcgcattgcataagaagtttgacttctcacatgcaggtaca 
attaaaaatgtaggttataaatttgatcgatggctcgatttatcattttatcaatatgat 

15 ttatctgattcataa 

Sequence 2478 

MIRFARLEDLQDILTIYNDAILNTTAVYTYKPQQLDERLQWYQSKAKINEPIWVYEiCEGK 
VVGFATYGSFRQWPAYLYTIEHSIYVHQQYRGLGIASQLLENLIRYAKEQGYRTIVAGID 
20 ASNM DS I ALHKK FDFS H AGT I KN VG Y KFDRWLDLS F YQ YDLS DS * 

Sequence 24 7 9 

Contig_0764_pos_4 801_5163, 
is similar to (with p-value 2.0e-52) 
25 >sp:sp|Q05217 |SECY_STACA PREPROTEIN TRANSLOCASE SECY SUBUNIT 
. >pir :pir I S30115 I S301 15 secY protein - Staphylococcus carno 
sua >gp:gp|X70086|SCSECY_l S.carnosus secY gene. NID: g4 9188 

atgattatttatgtagttttaattattgcatttgcatatttttatgcttttgtacaagtt 
30 aatcctgaaaaaatggcagataaccttaaaaagcaaggtagttatgtcccaggaattaga 
cctggtgaacaaacaaaaaaatatattactaaagtactttatagattgacttttgttggt 
tcaattttcttagcagctatagctattttacctataattgcgactaaatttatgggctta 
ccacaatcaattcaaattggtggtacgagtcttttgatcgttattggtgtagctattgaa 
actatgaaaactttagaagcacaagtcactcaaaaagaatataaaggctttggtggtaga 
35 taa 

Sequence 24 80 

MI7.YVVLIIAFAYFYAFVQVNPEKMADNLKKQGSYVPGIRPGEQTKKYITKVLYRLTFVG 
SIFLfJ^il-AlLPIIATKFMGLPQSIQIGGTSLLIVIGVAIETMKTLEAQVTQKEYK.-FGGR 
40 ♦ 

Sequence 2481 

Contig_0764_pos_6903_5197, 

is similar to (with p-value 2.0e-26) 

45 >sp:sp|P54159|YPBR_BACSU HYPOTHETICAL 137.4 KD PROTEIN IN BC 
SA-DEGR INTERGENIC REGION. >gp : gp ! L7724 6 I BACYACA_6 Bacillus 
subtilis (YAClO-9 clone) DNA region between the serA and kdg 
loci. NID: gl256615. >gp:gp| Z99115 I BSUB0012_144 Bacillus su 
btilis complete genome (section 12 of 21) : from 2195541 to 2 

50 409220. NID: g2634478. 

atgaaacgtattaatgaagttggtataccaattatttttgtgattaatcagattgataaa 
cataatgaagaagaaattacatttgaaacttttaaatcaagagtcgaaaaatcaatcaaa 
gactgggatatcaaacttcaagatacttattacgtttcaaagtttgatcatccacagaat 
gaaattgacaaactttcaaattttctagtatttatggatcaacatcgtgaatcaacagaa 

55 gactatgttaatagaaccattcaattcattaccgacgcacaatacatatacattcaaaat 
gaaatgcaatctattcttgacacccttcaaattaatgaagaacaattcgaggaagcatat 
attcaatttcaacaaaatcaagaagtcagcgcagaagcacaattgctcaatgactctaat 
cai.ttitrtaattatttaaaacagaagcgtaaagatatattagataatgcttata*:"jatg 
acgtacgatatgcgtgaatctttacggaattatttagaaagcatggcaactgattrtaaa 
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gtgaatggattttttaataaaaggaagaaaaaagaagaagaacaaatcaaacgacttaat 
gaggcgaccactcaattgcaagagaaagttaatcaacaagtacgacaaccacttcgtgaa 
gatatgtcatttttaactagattcataaataaacatgctgtgaatgaaaaaatactaaat 
caagaatatgacgtcgttccgtcacttatatcagagctatatcaaactcaaacgagcatt 
5 agcaacacatacgttttaacattttcagatgaagttataaaagctttgaataaaaaaata 
gaaaatgagtcaacaccactatttgaagaagctgtcaatcatgtacaagttaatgaatta 
tcgagtgatgaaaatgaagataggtatgaatatgatagatacattgaacttaacacatta 
aaggattcgcttacatcccacaactacaaacattactatatccatttagacgattcttta 
gataaattaattggaagaacagagactcattttgaacttaaacaagaaaattcaactgct 

10 ta^:c5tcgtaaacatgagacacaacatcgtaacgagtttgttacatctaatcaagrHtatt 
aagcgtgcattagatatcgttaaagatgtaccattatttgatcgcactaaacaagatatc 
accgataccattctgagactcgataatcaaataacaaaagttggtgtttttggtacattt 
agcgcaggtaaaagcagcttaatcaacgctctactaggtgaaaactatttagttagttct 
cctaatccaacaacagcggcaacgacggaattatcatatggtaaagagagtcaaatcaca 

15 ttaaaatcgaaagaacaattactagaggaagttaatcatgtactagaattttatgaaata 
tcgtttaacacattagacgactttattgagagtgatttagataagttaaaattgaaacta 
gaaaagaaccaacttgcatttattagtgcaattgagaaacattatgaaatgtacacatct 
atgttagaacattcacttatacacacagtatcgcttgaagaaattaaaaaatggagtgcc 
gaggatgagtatgctactttcgtgaaaactgtacaccttaagctacctttagattggctc 

20 aagggtaaaatcattattgcccattaa 

Sequence 2482 

MKRINEVGIPIIFVINQIDKHNEEEITFETFKSRVEKSIKDWDIKLQDTYYVSKFDHPQN 
EIDKLSNFLVFMDQHRESTEDYVNRTIQFITDAQYIYIQMEMQSILDTLQINEEQFEEAY 

25 IQFQQNQEVSAEAQLLNDSNQLFNYLKQKRKDILDNAYIMTYDMRESLRNYLESMATDFK 
VNGFFNKRKKKEEEQIKRLNEATTQLQEKVNQQVRQPLREDMSFLTRFINKHAVNEKILN 
QEYDVVPSLISELYQTQTSISNTYVLTFSDEVIKALNKKIENESTPLFEEAVNHVQVNEL 
SSDENEDRYEYDRYIELNTLKDSLTSHNYKHYYIHLDDSLDKLIGRTETHFELKQENSTA 
YHRKHETQHRNEFVTSNQDIKRALDIVKDVPLFDRTKQDITDTILRLDNQITKVGVrGTF 

30 SAGKSSLINALLGENYLVSSPNPTTAATTELSYGKESQITLKSKEQLLEEVNHVLL'FYEI 
SFNTLDDFIESDLDKLKLKLEKNQLAFISAIEKHYEMYTSMLEHSLIHTVSLEEIKKWSA 
EDEYATFVKTVHLKLPLDWLKGKIIIAH* 

Sequence 24 83 
35 Cont ig_0 7 6 4_pos_4 4 1 9_4 120, 

is similar to (with p-value 5,0e-52) 

>sp:sp|P4 314 8 !SEPA_STAEP EXTRACELLULAR ELASTASE PRECURSOR (E 
C 3.4.24.-) (SEPPl). >pir:pir IA40659IA40659 elastase, SepPl= 
etrxtracellular metalloprotease - Staphylococcus epidermidis 
40 >gp:gp 1X69957 I SESEPP1A_1 S. epidermis gene for protease. NID 
: g396258. 

atgtctaatccagagcgttttggacaaccatctcatatgaatgattttgtttatacaaat 
tctgacaacggaggcgtacatacgaattcaggtattccgaacaaagcagcttacaacaca 
attcgtagtattggtaaacaacgttctgaacaaatttattatagagcgttaactgtttat 
45 ttaacttcaaattctgatttccaagatgccaaagcatcattacaacaagcagcatttgat 
ttatatggcgacggtattgctcaacaagtaggtcaagcatgggacagtgttggcgtgtag 



Sequence 2484 

50 MSNPfRFGQPSHMNDFVYTNSDNGGVHTNSGIPNKAAYNTIRSIGKQRSEQIYYR-^LTVY 
LTSNSDFQDAKASLQQAAFDLYGDGIAQQVGQAWDSVGV* 

Sequence 2485 

Con t ig_07 64_pos_3 60 6_3 1 4 5 , 
55 putative peptide of unknown function 

atgaaaaagagtaaacgacaagatttagtaactatgattgttaagcaaaatcacatttat 
aaaaaagcagatattattgattacattgatgatcactttggtgtacgttatagcatgact 
actattgcgagagatttaagagaacttcatatttatcgcctgcctgtgaaggcaaatcaa 
tatgaatacaaattacttacgcaacaatctcaattagactcaagagtaagactaaatgat 
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tatatagaaacagaaattattaacactatgattaaagaatcgtatatactaataaaqacg 
acaccaggatttgcacaaagcataaattattatattgatcagttacaactgaaag?gatt 
ataggaacgataagtggtaatgatacaattatgattcttacacattcccagtctatagct 
gaatatgtctattacaaaatatttaatcataattattcataa 

5 

Sequence 24 86 

MKKSKRQDLVTMIVKQNHIYKKADIIDYIDDHFGVRYSMTTIARDLRELHIYRLPVKANQ 
YEYKLLTQQSQLDSRVRLNDYIETEIINTMIKESYILIKTTPGFAQSINYYIDQLQLKEI 
IGTISGNDTIMILTHSQSIAEYVYYKIFNHNYS* 

10 

Sequence 2487 

Cont ig_07 64_pos_28 12_1 7 4 2 , 

is similar to (with p-value O.Oe+00) 

>gp:gp| Y17554 I BLY17554_1 Bacillus lichenif ormis arcA, arcB/ 

15 arcC and arcD genes. NID: g3687415. 

atgagtatgacaaatggacctattcaagttaacagtgaaatcggcaaattaaagacggta 
ttacttaaaagaccagggaaggaattagaaaacttagtgcctgattacttagatggatta 
ttgtttgatgatattccatttttaaaggtagctcaacaagagcatgatcattttgctcaa 
gttcttcaagatgaaggaatagaagtgctttatttagaaaaattagcagcacaaagtata 

20 gaagattcaaatgtcagagagcaatttatagatgacgttttagcagaatctagaaaaact 
atcctaggtcatgaaaaagaaataaaaaaactcttttcaactttgtcaaatcaagcatta 
attaataagattatggctggcgtacgcaaagaagaaatacaacttgaatcgacacatctt 
gtagagtacatggatgataaatatccgttttatcttgacccaatgcctaatctatatttc 
acacgtgatcctcaagcttcaattggtagaggtatgacagtaaatcgtatgttttggaga 

25 gcgagacgcagagaatcgattt teat ttcatatattttaaaacatcat cot agatttaaa 
gatgagaatattcctttatgggtggatcgtgactgtccgttcaacatcgaaggtggagac 
gaactggtgttatctaaagatgt act tgcaatagggatatctgaacg tact tctgcacaa 
gcaattgaacgtttagcacgacgtatttttaaagatccgttatctacttttaaaaaggtg 
gtggcgattgagattccaactagtcgaacatttatgcacttagatactgtttgtacaatg 

30 attgattacgacaaattcactacacattcagcaattcttaaatcagaaggaaacatgaat 
atctttattatcgaatatgatgataaagctgaagatatcaaaatccaacattctagtcat 
cttaaacaaacattagaagaagtgctcgatgttgatgaaatcacattaataccaactgga 
aatggtgatatcatcgacggtgctcgtgaatggatgtccaggtcctggtga 

35 Sequence 2488 

MSMTNGPIQVNSEIGKLKTVLLKRPGKELENLVPDYLDGLLFDDIPFLKVAQQEHDHFAQ 
VLQDEGIEVLYLEKLAAQSIEDSNVREQFIDDVLAESRKTILGHEKEIKKLFSTLSNQAL 

INKIMAGVRKEEIQLESTHLVEYMDDKYPFYLDPMPNLYFTRDPQASIGRGMTVNRMFWR 
ARRRESIFISYILKHHPRFKDENIPLWVDRDCPFNIEGGDELVLSKDVLAIGISERTSAQ 
40 AIERLARRIFKDPLSTFKKVVAIEIPTSRTFMHLDTVCTMIDYDKFTTHSAILKSEGNMN 
I F: I.^YDDKAEDIKIQHSSHLKQTLEEVLDVDEI tlx PTGNGDI I DGAREWMSRSf i " 

Sequence 248 9 

Cont ig_07 6 5_pos_7 7 2 1 692, 

45 putative peptide of unknown function 

gtgatttcatctactggttgttt tgttactttttctgttggttcaccttcgccaactttt 
tcccctgttaatgggttcttagttgttggtgttgtaattgtttttgttcctggttcacct 
ttctgtttaacgcgctctttacctggttttaaatcaggattgaattcacgtttcttgtcg 
aatggaatttcttccgttgacgtgatcggatctccatcaactggaccatattttgtcaca 

50 tcatccactggtggtgtgactacttcgcctgtatcagggtttttaactcctggtttacct 
ggaacgtcctcttggctacctttcggtgcatttggatcaaattcatccttatggcctggc 
ttgatttcttcgccaccatattctgtgatttcatctactggttgttttgttattttttct 
gttggttcaccttcgccaactttttcccctgttaatgggttcttagttgttggtgttgta 
attgtttttgttcctggttcacctttttgtttaacacgctcttcacctggttttaaatca 

55 ggattgaattcacgtttcttgtcgaatggaatttcttccgttgacgtgatcggatctcca 
tcaactggaccatattttgtcacatcatccacaggtggagtaactacttcgcctgtatca 
ggatttttaaccccccggcttacctggttgcgttgtttgactacctttcggtgcatttgg 
atcaaattcatccttatggcctggcttgatttcttcgccaccataatgaacgatttcatc 
cactggttgttttgttattttttctgttggttcaccttcgccaactttttctcctgtatt 
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aggattgacataagttggtgttgttgttgtttcaattcctggttcacctttttggactac 
tttttctgtacctggggctaa 

Sequence 24 90 

5 VISSTGCFVTFSVGSPSPTFSPVNGFLWGWIVFVPGSPFCLTRSLPGFKSGLNSRFLS 
NGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPGLPGTSSWLPFGAFGSNSSLWPG 
LISSPPYSVISSTGCFVIFSVGSPSPTFSPVNGFLWGVVIVFVPGSPFCLTEISSPGFKS 
GLNSRFLSNGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPRLTWLRCLTTFRCIW 
IKFILMAWLDFFATIMNDFIHWLFCYFFCWFTFANFFSCIRIDISWCCCCFNSWFTFLDY 
10 FFCTWG*. 

Sequence 24 91 

Con t i g_0 7 6 5_po s_3 0 5 2_2 7 2 0 , 

putative peptide of unknown function 

15 atgcagaagcacctcaatctgagccaacgaagacagaagaaggaagcaacgcaaaagcag 
ctcaatctgagccaacgaaggcagaagaaggaggcaatgcagaagcagctcaatctgagc 
caacgaagacagaagaaggaagcaatgcagaagcacctcaatctgagccaacgaaggcag 
aagaaggaggcaatgcagaagcacctcaatctgagccaacgaagacagaagaaggaggca 
atgcagaagcaccgaatgttccaactatcaaagctaattcagataatgatacacaaacac 

20 aattttcagaagcccctacaagaaatgacctag 

Sequence 24 92 

MQKHLNLSQRRQKKEATQKQLNLSQRRQKKEAMQKQLNLSQRRQKKEAMQKHLNLSQRRQ 
KKEAMQKHLNLSQRRQKKEAMQKHRMFQLSKLIQIMIHKHNFQKPLQEMT* 

25 

Sequence 24 93 

Contig_07 65_pos_2 62 1_1 4 4 3, 

putative peptide of unknown function 

gtgaatttaaattatagttctccgtttatgtccttattaagcatgcctgctgatagttca 

30 tccaataacactaaaaatacaatagatataccgccaactacggttaaaggtagagataat 
tacgatttttacggtagagtagatatcgaaagtaatcctacagatttaaatgcgacaaat 
ttaacgaq.atataattatggacagccacctggtacaacaacagctggtgcagttcaattt 
aaaae.tcaagttagttttgataaagatttcgactttaacattagagtagcaaaca 2tcgt 
caaagtaatacaactggtgcagatggttggggctttatgttcagcaagaaagatggggat 

35 gatttcctaaaaaacggtggtatcttacgtgaaaaaggtacacctagtgcagctggtttc 
agaattgatacaggatattataataacgatccattagataaaatacagaaacaagctggt 
caaggctatagagggtatgggacatttgttaaaaatgactcccaaggtaatacttctaaa 
gtaggatcaggtactccatcaacagattttcttaactacgcagataatactactaatgat 
ttagatggtaaattccatggtcaaaaattaaataatgttaatttgaaatataatgcttca 

40 aatcaaacttttacagctacttatgctggtaaaacttggacggctacgttatctgaatta 
ggattgagtccaactgatagttacaattttttagttacatcaagtcaatatggaaatggt 
aatagtggtacatacgcaagtggcgttatgagagctgatttagatggtgcaacattgaca 
tacactcctaaagcagtcgatggagatccaattatatcaactaaggaaataccatttaat 
aagaaacgtgaatttgatccaaacttagccccaggtacagaaaaagtagtccaaaaaggt 

45 gaaccaggaattgaaacaacaacaacaccaacttatgtcaatcctaatacaggagaaaaa 
gttggcgaaggtgaaccaacagaaaaaataacaaaacaaccagtggatgaaatcgttcat 
tatggtggcgaagaaatcaagccaggccataaggatgaatttgatccaaatgcaccgaaa 
ggtagtcaaacaacgcaaccaggtaagccggggggttaa 

50 Sequence 24 94 

VNLNYSSPFMSLLSMPADSSSNNTKNTIDIPPTTVKGRDNYDFYGRVDIESNPTDLNATN 
LTnYNYGQPPGTTTAGAVQFKNQVSFDKDFDFNIRVANNRQSNTTGADGWGFMFSf.ADGD 
DFLK^jGCJLREKGTPSAAGFRIDTGYYNNDPLDKIQKQAGQGYRGYGTFVKNDSQv:NTSK 
VGSGTPSTDFLNYADNTTNDLDGKFHGQKLNNVNLKYNASNQTFTATYAGKTWTATLSEL 

55 GLSPTDSYNFLVTSSQYGNGNSGTYASGVMRADLDGATLTYTPKAVDGDPIISTKEIPFN 
KKREFDPNLAPGTEKVVQKGEPGIETTTTPTYVNPNTGEKVGEGEPTEKITKQPVDEIVH 
YGGEEIKPGHKDEFDPNAPKGSQTTQPGKPGG* 

Sequence 24 95 
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Cont ig_07 65_pos_0_l 408, 

putative peptide of unknown function 

gtggatgatgtgacaaaatatggtccagttgatggagatccgatcacgtcaacggaagaa 
attccattcgacaagaaacgtgaattcaatcctgatttaaaaccaggtgaagagccjtgtt 
5 aaacaaaaaggtgaaccaggaacaaaaacaattacaacaccaacaactaagaacccatta 
acaggggaaaaagttggcgaaggtgaaccaacagaaaaaataacaaaacaaccagtagat 
gaaatcacagaatatggtggcgaagaaatcaagccaggccataaggatgaatttgatcca 
aatgcaccgaaaggtagccaagaggacgttccaggtaaaccaggagttaaaaaccctgat 
acaggcgaagtagtcacaccaccagtggatgatgtgacaaaatatggtccagttgatgga 

10 gatccgatcacgtcaacggaagaaattccattcgacaagaaacgtgaattcaatcctgat 
ttaaaaccaggtaaagagcgcgttaaacagaaaggtgaaccaggaacaaaaacaattaca 
acaccaacaactaagaacccattaacaggggaaaaagttggcgaaggtgaaccaacagaa 
aaagtaacaaaacaaccagtagatgaaatcacagaatatggtggcgaagaaatcaagcca 
ggccataaggatgaatttgatccaaatgcaccgaaaggtagccaagaggacgttccaggt 

15 aaaccaggagttaaaaatcctgatacaggcgaagtagttactccaccagtggatgatgtg 
acaaaatatggtccagttgatggagatccgattacgtcaacggaagaaattccgtttgat 
aaaaaacgcgaatttgatccaaacttagcgccaggtacagagaaagtcgttcaaaaaggt 
gaaccaggaacaaaaacaattacaacaccaacaactaagaacccattaacaggggaaaaa 
gttggcgaaggtgaaccaacagaaaaagtaacaaaacaaccagtggatgaaatcgttcat 

20 tatggtggcgaagaaatcaagccaggccataaggatgaatttgatccaaatgcaccgaaa 
ggtagccaagaggacgttccaggtaaaccaggagttaaaaaccctgatacaggcgaagta 
gttactccaccagtggatgatgtgacaaaatatggtccagttgatggagatccgattacg 
tcaacggaagaaattccgtttgataaaaaacgcgaatttgatccaaacttagcgccaggt 
acagogaaagtcgttcaaaaaggtgaaccaggaacaaaaacaattacaacaccaacaact 

25 aagaacccattaacaggggaaaaagttggcgaaggtgaaccaacagaaaaagtaacaaaa 
caaccagtggatgaaatcgttcaTATCG 

Sequence 24 96 

VDDVTKYGPVDGDPITSTEEIPFDKKREFNPDLKPGEERVKQKGEPGTKTITTPTTKNPL 
30 TGEKVGEGEPTEKITKQPVDEITEYGGEEIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPD 
TGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFNPDLKPGKERVKQKGEPGTKTIT 
TPTTKNPLTGEKVGEGEPTEKVTKQPVDEITEYGGEEIKPGHKDEFDPNAPKGSQEDVPG 
KPGVKNPDTGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFDPNLAPGTEKVVQKG 
EPGTKTITTPTTKNPLTGEKVGEGEPTEKVTKQPVDEIVHYGGEEIKPGHKDEFDPNAPK 
35 GSQEDVPGKPGVKNPDTGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFDPNLAPG 
TEKVVQKGEPGTKTITTPTTKNPLTGEKVGEGEPTEKVTKQPVDEIVHIX 

Sequence 24 97 

Cont ig_07 66_pos_l 9 4 9_3 403, 

40 is similar to (with p-value l.Oe-62) 

>sp:sp|P14567(DP3A_$ALTY DNA POLYMERASE III, ALPHA CHAIN (EC 
2.7.7.7). >pir:pir|A45915|A459i5 DNA-directed DNA polymeras 
e (EC 2.7.7.7) III alpha chain - Salmonella typhimurium >gp: 
gpiM^^TOl |STYDNAE_1 S . typhimurium polymerase III polymerase 

45 subunit gene, complete cds . NID: gl53951. 

atggaagaaataccaacttatataacccgtagacataatcctaaccaatttgcttattta 
catccagatttagaaccaatcttaaaaaacacatatggtgttatcatttatcaagaacaa 
ataatgctaatagcaagtcaagttgctggttttagttatggtgaagcagatattttaaga 
agggcaatgagtaaaaagaatcgtgcaatcttagaaagtgagcgtcaacatttcattgat 

50 ggtgcaaaaaataacggttacgatgaacagataagtaagcaaatttttgatttaatactt 
aagtttgcagattatgggttcccacgtgcccatgctgttagttactcaaaaattgcatac 
attatgagctatttaaaagtgcactatcctcattatttttatgcaaatatcttgagtaat 
gtaataggaagtgaaaaaaagactgcagctatgattgacgaagctaagcaccaaagaatt 
agcatcttgcctcccaatattaatcaaagtcattggtattataaggcaagtaataaagga 

55 atatatctgtctttaggtacaattaaaggaattggatatcaaagcgttaaattaattatt 
gatgaacgtcagcagaatggaccttatagagatttctttgatttttcaagacgtatacca 
aaaagggtgaaaaatagaaaattacttgagtctcttatcttagtaggcgcattcgacact 
tttggcaaaactagagcgacattattacaagcaattgatcaagtattagatttgaattct 
gatgtt^agcaagatgaaatgcttttcgatcttttaactcctaaacaatcgtatgaagaa 
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aaagaggaactacctgatcaattattaagtgattatgaaaaagaatacctaggattctat 
attagtaaacatccagttgaaaagaaatttgaaaagaaacaatatttaggcatatttcaa 
ttgtctaatggaagtcactaccaacctatacttgttcaatttgaccatatcaaacaaata 
agaacgaagaatggtcaaaatatggcatttgtaacgatgaatgatggaagaacgatgatg 
5 gatggagtgattttcccagataagtttaaaaaatacgaaacttctatttcaaaggaacag 
atgtatatcgtattaggtaaatttgaaaagcgtaaccaacaaatgcaacttatcatcaat 
caactttttgaagttgaagcgtatgagcaaacaaaattgtctaattcgaaaaaagttatt 
ttacgtaatgtaacacatctagaaccacaatttgaacattcaaaagtagaatctaatgaa 
caacatgcattaaatatttatggttttgacgaaagtgcaaataagatgacaatgttggga 
10 caaattgaacgtcaacgtcaaaattttgatctattaatacaaacttattcgccagctgat 
attagatttatttaa 

Sequence 24 98 

MEEIPTYITRRHNPNQFAYLHPDLEPILKNTYGVIIYQEQIMLIASQVAGFSYGEADILR 
15 RAbtSKKNRAILESERQHFIDGAKNNGYDEQISKQIFDLrLKFADYGFPRAHAVSYPKIAY 
IMSYi.fC.'^HYPHYFyANILSNVIGSEKKTAAMIDEAKHQRISILPPNINQSHWYYK-SNKG 
lYLSLGTIKGIGYQSVKLIIDERQQNGPYRDFFDFSRRIPKRVKNRKLLESLILVGAFDT 
FGKTRATLLQAIDQVLDLNSDVEQDEMLFDLLTPKQSYEEKEELPDQLLSDYEKEYLGFY 
ISKHPVEKKFEKKQYLGIFQLSNGSHYQPILVQFDHIKQIRTKNGQNMAFVTMNDGRTMM 
20 DGVIFPDKFKKYETSISKEQMYIVLGKFEKRNQQMQLIINQLFEVEAYEQTKLSNSKKVI 
LRNVTHLEPQFEHSKVESNEQHALNIYGFDESANKMTMLGQIERQRQNFDLLIQTYSPAD 
IRFI* 

Sequence 24 99 
25 Contig_07 66_pos_35 67_4 3 64 , 

is similar to (with p-value 7.0e-79) 

>sp:spl P54537 I YQIZ_BACSU PROBABLE AMINO-ACID ABC TRANSPORTER 
ATP-BINDING PROTEIN IN BMRU-ANSR INTERGENIC REGION. >gp:gp| 
D84 4 32|BACJH642_255 Bacillus subtilis DNA, 283 Kb region con 
30 taining skin element. NID: g2627063. >gp: gp I Z99116 I BSUB0013_ 
107 Bacillus subtilis complete genome (section 13 of 21) : fr 
om 2395261 to 2613730. NID: g2634723. 

atgattgtactattgttctacatatttactagtaaatcgtgtatatttaaaatgattcag 
ttttcttttcagccagtgattaatattaaaaatttaaataaaaaatttggagcaaatgaa 

35 gtattgagagatattaatcttactgttgaaaagggtgaagtggttgcaataattggacca ' 
tctggaagtggtaaaagcactttactccgttgtatgaatttgttagatgtaccttcaaaa 
ggtaaagttatatttgaagataatgaattaactcaacataatgttcatttagataattta 
cgacaaaaaatgggtatggtatttcaaaattttaatttatttcctcataaaaaggtcatt 
gaaaatgtaatgttggcaccacttttattacataaagatagtaaagatcaattaaaagag 

40 aaagctttatatttacttgaaaaagtggggcttaaagacaaagcagattcatatcctaat 
caactgtctggaggtcaaaaacaaagagttgctattgcaagagctttggcaatggaacct 
gatgttatgttatttgatgaaccaacatctgcacttgatcctgaagtagtaggggatgtt 
ttaaaagtaatgagacaattagcgaatgaaggtatgacaatggtgattgtcacgcatgag 
atgaactttgctaaagaaataagtgataaagtagtatttatggccgatggtgttgtggtt 

45 gaatctggtacaccacaaaacatatttgaaaatcctcagcacagtcgaactgaaaatttt 
ttatcacgagtgttataa 

Sequence 2500 

MIVLLFYIFTSKSCIFKMIQFSFQPVINIKNLNKKFGANEVLRDINLTVEKGEVVAIIGP 
50 SGSGKSTLLRCMNLLDVPSKGKVIFEDNELTQHNVHLDNLRQKMGMVFQNFNLFPHKKVI 
ENVMLAPLLLHKDSKDQLKEKALYLLEKVGLKDKADSYPNQLSGGQKQRVAIARALAMEP 
DVMLFDEPTSALDPEVVGDVLKVMRQLANEGMTMVIVTHEMNFAKEISDKWFMADGWV 
ESGTPQNIFENPQHSRTENFLSRVL* 

55 Sequvnct-. 2501 

Con t ig_07 6 6_pos_4 7 0 6_5 14 9, 

is similar to (with p-value l.Oe-20) 

>pir:pir|S05373|S05373 etc protein - Bacillus subtilis (frag 
ment ) 
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atggtttcagattatcaatttgatccattaaaaaaccaaatcactcatattgacttttta 
gcaatcaacatgagtgaagaacgtactgttgaagtacaagttcaattagttggtgaagct 
gtaggtgctaaagaaggcggcgtagttgaacaaccattattcaacttagaagttacagct 
acacctgaaaatattcctgaaactatcgaagtagatatcagtgaattacaagttaatgac 
5 agcttagcagtttctgatattaaaatctctggtgatttcacaatcgaaaataatccagaa 
gattctatGgtaacagtagttcctccaacagatgaaccttctgaagaagaagttgaagct 
atggaaggcgaatcagcaactgaagaaccagaagttgttggtgaagacaaagaagacgat 
gaagaagaaaataaagaagactaa 

10 Sequence 2502 

MVSDYQFDPLKNQITHIDFLAINMSEERTVEVQVQLVGEAVGAKEGGVVEQPLFNLEVTA 
TPENIPETIEVDISELQVNDSLAVSDIKISGDFTIENNPEDSIVTWPPTDEPSEEEVEA 
MEGESATEEPEVVGEDKEDDEEENKED* 

15 Sequence 2503 

Contig_07 66_pos_538 7_0, 

is similar to (with p-value 4.0e-49) 

>sp:sp|P37470|SP5C_BACSU PROBABLE PEPTIDYL-TRNA HYDROLASE (E 
C 3.1.1.29) (PTH) (STAGE V SPORULATION PROTEIN C) . >gp:gp|D2 
20 6185|BAC180K_116 B. subtilis DNA, 180 kilobase region of rep 
lication origin. NID: g467326. >gp:gp | Z99104 | BSUB0001_53 Bac 
illus subtilis complete genome (section 1 of 21) : from 1 to 
213080. NID: g2632267. 

gtggaggtaacaataatgaaatgcattgtcggtcttggcaacattggtaaacgttttgaa 
25 ttaacaagacataatattggtttcgaagttgtcgatgatattctagaacgccaccaattt 
actttagacaaacaaaaatttaaaggtgca tat acta ttgaacgtttaaacggcgaaaaa 
gtatratttattgagccaatgaccatgatgaacttatctggtcaagctgtagccccttta 
atggattattataatgtcgatgttgaagatttgatcgttttatatgacgatttagattta 
gaacaaggacaagtgcgtctgcgccaaaaggggagtgcaggcggtcataatggtatgaaa 
30 tcgataattaaaatgcttggtacagatcaatttaaacgtattcgaattggtgttggccgt 
ccaacaaatgggatgtctgttccggactatgttttacaaaaattttcaaaagaagaaatg 
atcattatggaaaaggtaattgaacattctgcaagagctgtagaatcttttattgaaagt 
tctcgttttgatcatgttatgaatgaa 

35 Sequence 2504 

VEVTIMKCIVGLGNIGKRFELTRHNIGFEVVDDILERHQFTLDKQKFKGAYTIERLNGEK 
VLFIEPMTMMNLSGQAVAPLMDYYNVDVEDLIVLYDDLDLEQGQVRLRQKGSAGGHNGMK 
SIIKMLGTDQFKRIRIGVGRPTNGMSVPDYVLQKFSKEEMIIMEKVIEHSARAVESFIES 
SRFDHVMNE 

40 

Sequence 2505 

Contig_0766_pos_1682_1317, 

putative peptide of unknown function 

atgtatagtgaaaaagaaatcatacgaaaagtcgaaagtttagcagagaaaattggaaaa 
45 ctagaagttgttcaagattatcataatgtagaaaaacaaattcataataatcaagcaata 
aaacaaaagatgaatcgtttgaaagcgcaacaaaaacaatcggttaattttcaaaattat 
ggaaaacaaaatgcactcgagcaatctgaagttaaaattcagaatctaaaagatgaaatt 
aatgaattacctattgttgaagaatttcgttcagcacaatatgaagcgaatgatttacta 
caaatgatggtcaaaacaatggaagatagactcaatgaatataataaaaaagaacacaat 
50 gaataa 

Sequence 2506 

MYSEKEIIRKVESLAEKIGKLEVVQDYHNVEKQIHNNQAIKQECMNRLKAQQKQSVNFQNY 
GKQNALEQSEVKIQNLKDEINELPIVEEFRSAQYEANDLLQMMVKTMEDRLNEYNKKEHN 
55 E* 

Sequence 2507 

Contig_0766_pos_1293_7 99, 

putative peptide of unknown function 
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atgaatattagaaagttgaccatcacagcattcttaatagctattaatgtcgtgttaagt 
agtttaattgtcattcctttaggtccaattaaagccgcacccgttcaacattttgtaaat 
gtattatgtgctgtatttgttggaccatggtatggtttagcgcaagcttttatttcttct 
gtactacgaatttcatttggaactggaagcgcatttgcttttccaggaagtatgataggt 
5 gtcttactttccagtctgttttatatgtataggaagcatatttttatggcttcagttggt 
gaagtattagggactggtgttattggtagtttaatgtgtatacctttagcatggttttta 
ggacttcaagatttctttattaaaccattaatgcttatgttcatagtatcaagttttatt 
ggggctttaattagttatatattgcttattattttgaaaagaagaggtttactagataga 
tttaacaaaaattaa 

10 

Sequence 2508 

MNIRKLTITAFLIAINVVLSSLIVIPLGPIfCAAPVQHFVNVLCAVFVGPWYGLAQAFISS 
VLRISFGTGSAFAFPGSMIGVLLSSLFYMYRKHIFMASVGEVLGTGVIGSLMCIPLAWFL 
GLQDFFIKPLMLMFIVSSFIGALISYILLIILKRRGLLDRFNKN* 

15 

Sequence 2509 

Cont ig_07 66_pos_0_61 3 , 

is similar to (with p-value 4.0e-50) 

>sp:sp| P4 984 9|MUTS_BACSU DNA MISMATCH REPAIR PROTEIN MUTS . > 
20 gp:gp|U27343|BSU27343_2 Bacillus subtilis spore coat protein 
(cotE) gene, partial cds, and mismatch repair recognition p 
roteins (mutS) and (mutL) genes, complete cds. NID: gl002518 

gtgaaaatagaaatggctaacattacaccaatgatgcaacaatatttaaagataaaatct 
25 gaatatgatgattgtttgctattttttagactcggagatttctatgaaatgttctttgat 
gatgctaaagaagcatcaagagtacttgaaataacattgacgaaaagagatgctaaaaaa 
gaaaatcctattccgatgtgtggcgtaccatatcattctgctgataattacattgaaaca 
ttgattaataatgggtataaggtcgctatatgtgaacaaatggaagatccaaagcaaaca 
aaaggaatggttagaagagaagttgtaagaatcatcacaccaggaactgttatggatcaa 
30 aatggtatggatgaaaagaaaaataattatattttaagttttatcgaaaatgaagaattt 
ggattatgctattgtgatgtttctacaggcgaactcaaagtaactcatttcaaagataca 
gcaaccttgcttaatgagattacaacaattaatcccaatgaaatcgtcataaagcaagct 
ctatctgaagaattaaaaagacaaatcaacatgataactgagacgattactgttcgcgaa 
gatataTATGTTT 

35 

Sequence 2510 

VKIEMANITPMMQQYLKIKSEYDDCLLFFRLGDFYEMFFDDAKEASRVLEITLTKROAKK 
ENl'XT-MCCVPYHSADNYIETLINNGYKVAICEQMEDPKQTKGMVRREVVRIITPG'^VMDQ 
NGMDEKKKNYILSFIENEEFGLCYCDVSTGELKVTHFKDTATLLNEITTINPNEIVIKQA 
40 LSEELKRQINMITETITVREDIYVX 

Sequence 2511 

Contig_07 67_pos__538 9 57 03 , 

is similar to (with p-value l.Oe-24) 

45 >gp:gp| AF006075 |AF006075__3 Bacillus subtilis acetoin dehydro 
genase enzyme system gene cluster, ribosomal protein L6-like 
protein gene, partial cds, TPP-dependent acetoin dehydrogen 
ase. El alpha-subunit (acoA) , TPP-dependent acetoin dehydrog 
enase. El beta-subunit (acoB) , dihydrolipoamide acetyltransf 

50 erase (acoC) and dihydrolipoamide dehydrogenase (acoL) genes 
, complete cds, and regulatory protein (acoR) gene, partial 
cds. NID: g2957145. >gp : gp | Z99108 | BSUB0005_76 Bacillus subti 
lis complete genome (section 5 of 21) : from 802821 to 101125 
0. NID: g2633055. >gp : gp | D78509 | D78509_4 Bacillus subtilis Y 

55 fjG-YfjR genes, complete cds. NID; g2780390. 

atgagttcagcaattggacgtagccctgtcgcagcagctcctactgccgtgctcaatgta 
atgtgttcagcaattggtgtatcgattacacgtttacgactatatttttttgcaag'.cct 
tttgc".cricaccaaatacaccaccgaatgtatcgtcatctttgatgtggtctactt'-cgca 
ccacctgagacatcagtaccaattaaaatgacatcctcatctttttccatagattggtca 
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atagcttcattaatagcccccataaatgttaacttgcgttcttcactcatcgtaaaaacc 
tccttatgtggttag 

Sequence 2512 

5 MSSAIGRSPVAAAPTAVLNVMCSAIGVSITRLRLYFFASPFVTPNTPPNVSSSLMWSTFA 
PPETSVPIKMTSSSFSIDWSIASLIAPINVNLRSSLIVKTSLCG* 

Sequence 2513 
Contig_0767_pos_5685_4 645, 

10 is similar to (with p-value O.Oe+00) 

>gp:gp|AF006075t AF006075_3 Bacillus subtilis acetoin dehydro 
genase enzyme system gene cluster, ribosomal protein L6-like 
protein gene, partial cds, TPP-dependent acetoin dehydrogen 
ase. El alpha-subunit * (acoA) , TPP-dependent acetoin dehydrog 

15 enase, El beta-subunit (acoB) , dihydrolipoaraide acetyltransf 
erase (acoC) and dihydrolipoamide dehydrogenase (acoL) genes 
, complete cds, and regulatory protein (acoR) gene, partial 
cds. NID: g2957145. >gp: gp I Z99108 I BSUB0005_76 Bacillus subti 
lis complete genome (section 5 of 21): from 802821 to 101125 

20 0. NID: g2633055. >gp : gp I D78509 I D78509_4 Bacillus subtilis Y 
fjG-Yf jR genes, complete cds. NID: g2780390. 

atgagtgaagaacgcaagt taacatttatgggggctattaatgaagctattgaccaatct 
atggaaaaagatgaggatqtcattttaattggtactgatgtctcaggtggtgcaaaagta 
gaccacatcaaagatgacgatacattcggtggtgtatttggtgtaacaaaaqgacttgca 

25 aaaaaatatagtcgtaaacgtgtaatcgatacaccaattgctgaacacattacattgagc 
acggcagtaggagctgctgcgacagggctacgtccaattgctgaactcatgttcaacgac 
tttattggatttggtttagatccaattttaaatcaaggggcaaaaatgagatatatgttt 
ggtggaaaagccaaaatcccactagttgtacgtactcttcatggagcaggggcaagcgct 
gctgcacagcactctcagtctttatataatatgtttgcagcaattccaggagttaa;jgtt 

30 gttgtt coat ctaat coat at gatgcgaagggtct act gat gtcagctattcaagr-.ggac 
aatcttgttgtcttttcagaagataaaacattattaggacaaaaaggtaatgttcctgaa 
gaaccttatactatagaaattggtaaagccaatgtgacgcgtgaaggtqacgatttaaca 
attgtggctattggaaaaatggtagctgtagcggaagaaactgctgaaaaacttgcagaa 
gaccaagtatcagttgaggtcatcgatttacgctcagtgtcaccatgggatcaagaaaca 

35 gttttagattctgtgaagaaaacgggtcgcttaattgttattgacgaatctaatccacag 
tgtaacattgctggagacgttgcttcagtgattggagatgtaggatttgattacttagat 
ggtccaattaagaaagtgaccgcaccagacactcctgtaccatttgcagcgaacttagag 
gcggcatatatgccgaatgctgataaggtattagacattgcatctgaattaattgatgat 
ttaaaaaaggctaacgcatag 

40 

Sequence 2514 

MSEERKLTFMGAINEAIDQSMEKDEDVILIGTDVSGGAKVDHIKDDDTFGGVFGVTKGLA 
KKYSRKRVIDTPIAEHITLSTAVGAAATGLRPIAELMFNDFIGFGLDPILNQGAKMRYMF 
GGKAKIPLVVRTLHGAGASAAAQHSQSLYNMFAAIPGVKVWPSNPYDAKGLLMSAIQED 
45 NLVVFSEDKTLLGQKGNVPEEPYTIEIGPCANVTREGDDLTIVAIGKMVAVAEETAEKLAE 
DQVSVEVIDLRSVSPWDQETVLDSVKKTGRLIVIDESNPQCNIAGDVASVIGDVGFDYLD 
GPIKKVTAPDTPVPFAANLEAAYMPNADKVLDIASELIDDLKKANA* 

Sequence 2515 
50 Contig_0767_pos_4 613_3354, 

is similar to (with p-value 3.0e-72) 

>gp:gpl Z99108 |BSUB0005_77 Bacillus subtilis complete genome 
(section 5 of 21): from 802821 to 1011250. NID: g2633055. >g 
p:gp|D78509} D78509_3 Bacillus subtilis YfjG-YfjR genes, comp 
55 lets cds. NID: g2780390. 

atgccaaagcttggaatgacaatgaaagagggaactgttgaagagtggtttaaatcagag 
ggtgacaccgtaaaacaaggagagagtattgttacaataagctctgaaaaat taaccaac 
gatgttgaagcgccggcgagtgggacattgttagaaattaaagtgcaagccggagaagat 
gcagaggttaaagcggtattaggtataattggagaagaaggggaagctattgataaagat 
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ga'.g^tgatttagcatcagaaaaagtaaaagaagacaacgagcatgagaaggaaa^jgcaa 
gaagttaaagatacatcacaacagtcttccgataataaagataattcgcctaaaagcgca 
gcacgagaaagaatctttatctcacccctcgcacgtaatatggctgaggataaaggatta 
gacattaacaagataaaaggcacaggcggtaatcatcgtattacaaaactagatattcaa 
5 cgtgttgaagcaaatgggtacgactatgctagtgatacgacatctaatgaagatacaagt 
catgttccaacacagactgtggatacaagtgcgattggtgaaggattgaatcctatgcgt 
caacgtattgctcaaaacatgagacaaagtcttaatagtactgctcaattaacattacat 
cgtaaggttgatgcggatcgcttgctagatttcaaagacagattagctacggaacttaaa 
caagcagatcaagatgttaaattaactgttactacattattagctaaagcagtagtgctt 

10 gcacttaaagaatatggggcaatgaatgctcgctatgaacaaggcgagttaactgagtat 
gaagatgttcatttaggaatcgcaacgtctctagatgaaggccttatggtgccagtgatt 
aatcatgcagatacaaaaagtatcggcactttagcccatgaaattaaatcatcggctgag 
gctgttcgggaaggaaacacaggagcagtacaattagagggagcaacatttacaattact 
aatatgggtgctagtggtatagaatactttacaccaattttaaatttaggtgaaacaggt 

15 attctaggcgttggtgctttaactaaagaagtcgtgctagaagcggataacattaaacaa 
gtttcaaaaattcctttaagcttgacatttgatcatcaaattttagatggtgcaggtgcg 
gccgattttcttaaagtactagctaaatatatcgaaaacccttatttattaatgttatag 



20 Sequence 2516 

MPKLGMTMKEGTVEEWFKSEGDTVKQGESIVTISSEKLTNDVEAPASGTLLEIKVQAGED 
AEVKAVLGIIGEEGEAIDKDEDDLASEKVKEDNEHEKETQEVKDTSQQSSDNKDNSPKSA 
AEIERIFISPLARNMAEDKGLDIMKIKGTGGNHRITKLDIQRVEANGYDYASDTTSNEDTS 
HVPTQTVDTSAIGEGLNPMRQRIAQNMRQSLNSTAQLTLHRKVDADRLLDFKDRLATELK 

25 QADQDVKLTVTTLLAKAVVLALKEYGAMNARYEQGELTEYEDVHLGIATSLDEGLMVPVI 
NHADTKSIGTLAHEIKSSAEAVREGNTGAVQLEGATFTITNMGASGIEYFTPILNLGETG 
I LGVG ALT KE VVLEADN I KQVS KI PLS LT FDHQI LDGAGAADFLKVLAK Y I EN P Y LLML * 



30 Sequence 2517 

Contig_07 67_pos_2671_2159, 

putative peptide of unknown function 

atgaaaaggtttgcaaaagcatttgtcgtaagtggtattactttaggtgcagttttaggt 
ttaaacgtaacagagcataatggtgtatctaatgaagcaaaggcacaaacagcacacagt 

35 tactggtataaatataatggttatactgcatcgggtggcgactttgtacttagcaattca 
ttttatcaaggtttaaaagctggaaacgttacatttaatggtattaaggtaaatcaaaaa 
tatgaatctaagactgctactaaaaaaatatacgatcagacatttcaacaaattaatgga 
aataaagcaaacaacgtacaatttaaaattgcttccagaactgttactttagatcaagtt 
aaacaaaagtatggaaaaaattataattatcagccgtcattatctaaaaacaaaacaagt 

40 aagacagatggcttgtacggttatcaagtcggaaaaggaaacatcgttttccacgt iiaaa 
gatgcgtcitgtcacaagtgctacattgtcataa 

Sequence 2518 

MKRFAKAFVVSGITLGAVLGLNVTEHNGVSNEAKAQTAHSYWYKYNGYTASGGDFVLSNS 
45 FYQGLKAGNVTFNGIKVNQKYESKTATKKIYDQTFQQINGNKANNVQFKIASRTVTLDQV 
KQKYGKNYNYQPSLSKNKTSKTDGLYGYQVGKGNIVFHVKDGYVTSATLS* 

Sequence 2519 

Cent ig_07 67_pos_l 87 6_1568 , 

50 putative peptide of unknown function 

atgagtgagaatatcattaataataaaatgatagagaatggtagacctgtaataactgat 
gcagtttgtaaacttttaagaccacctgcaattgtcatagcaactgaaatcgccccaata 
agtagaccccataggactttatgttttaaagttggattgatagaaccacctgtagccata 
cctgcaacaatatgcgtagttgaatcagcacttgttacaataaagatgaaaataagtaca 

55 atagctaaggaactagtgacttccgacaatggaaaatgtgataataattcaaacaaggca 
acagtataa 

Sequence 2520 

MSENIINNKMIENGRPVITDAVCKLLRPPAIVIATEIAPISRPHRTLCFKVGLIEPPVAI 
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PATICVVESALVTIKMKISTIAKELVTSDNGKCDNNSNKATV* 

Sequence 2521 

Cont ig_07 68_pos_22 98_2 642, 
5 putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgccccttatgatttgggc 
tacacacgtgctacaatggacaatacaaagggtagcgaaaccgcgaggtcaagcaaatcc 

cataaagttgttctcagttcggattgtagtctgcaactcgactatatgaagctggaatcg 
ctagtaatcgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacaccgcc 
10 cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 2522 

VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
15 LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEWTR* 

Sequence 2523 

Contig_07 68_pos_7279_6914, 

is similar to (with p-value l.Oe-27) 

20 >gp:gp|U96108 I SCU96108_3 Staphylococcus carnosus (3R)-hydrox 
ymyristoyl acyl carrier protein dehydrase homolog (fabZ) gen 
e, partial cds, YwpF homolog, single-strand binding protein 
homolog (ssb), SceD precursor (sceD), SceA precursor (sceA) 
and SceE precursor (sceE) genes, complete cds, and TenA homo 

25 log (tenA) gene, partial cds. NID: g2735509. 

atgacgaaagatgctcaaatctatgaaaaggaggataacaaaatagcaacattttgcgtt 
gctacagaaagaaactacaaagatgatattaatgaaatttctacagattatttactttgt 
aaagctttcggtaaaactgccactaatattgaaaagtacactagtcaaggtactttagta 
ggtataactggacaaatgcgttcaagaaaatatgaaaaagaaggccaaacacattttgtt 

30 acagaattatacgttgaaacaataaaattcatgtccccaaaaaataaaaacaatgaaact 
ccctctgataatcaatttgaaaacaacacttatcaacctgatgatttagaaataattcat 
att-.t^a 

Sequence 2524 

35 MTKDAQIYEKEDNKIATFCVATERNYKDDINEISTDYLLCKAFGKTATNIEKYTSQGTLV 
GITGQMRSRKYEKEGQTHFVTELYVETIKFMSPKNKNNETPSDNQFENNTYQPDDLEIIH 
I* 

Sequence 2525 
40 Contig_0768__pos_6526_5867, 

is similar to (with p-value 5.0e-49) 

>9P-9PlU96108|SCU96108_4 Staphylococcus carnosus (3R)-hydrox 
ymyristoyl acyl carrier protein dehydrase homolog (fabZ) gen 
e, partial cds, YwpF homolog, single-strand binding protein 

45 homolog (ssb), SceD precursor (sceD) , SceA precursor (sceA) 
and SceE precursor (sceE) genes, complete cds, and TenA homo 
log (tenA) gene, partial cds. NID: g2735509. 
atgaaaaaaacattagttgcatcatcattagctataggactaggcgttgtagcaggtaac 
gcaggtcatgacgcacatgcaagtgagactacaaatgttgataaagcagagttagctcaa 

50 aaagcgttaactaatgatcaatcactaaatgaaagccctgttcaagaaggtgcttataat 
attaattttgattacaatggtaattcatatcattttgaatctgatggttctactta^agt 
tggavTct^'.tgaatcaacaaacaatgctacgcaacctgttcaaccaagccaatctc.-agta 
gctacacaacaacaacctgtacaagtatcagcaccacaaaatgagcaaacagcacaacca 
caaacaaaatcaacatctacaagtcaaacttcttcaagtaaagcttcaagtqgttcatca 

55 gtaaatgtaaattcacatttacaacaaattgctcaacgtgaatcaggtggcga tat teat 
gcaataaacccatcttcaggtgcagcaggtaaatatcaattcttacaatcaacttgggat 
tctgttgctccaagtcaatataaaggtgtttcacctgcaaaagctccagaaagcgtacaa 
gaccgagcagcagtaaaattatataatactggtggcccaggtcattgggtaactgcataa 
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Sequence 2526 

MKKTLVASSLAIGLGWAGNAGHDAHASETTNVDKAELAQKALTNDQSLNESPVQEGAYN 
INFDYNGNSYHFESDGSTWSWSYESTNNATQPVQPSQSQVATQQQPVQVSAPQNErrAQP 
5 QTKSTSTSQTSSSKASSGSSVNVNSHLQQIAQRESGGDIHAINPSSGAAGKYQFLQSTWD 
SVAPSQYKGVSPAKAPESVQDRAAVKLYNTGGPGHWVTA* 

Sequence 2527 

Contig_07 68_pos_5390_4 893, 
10 is similar to (with p-value 3.0e-44) 

>gp:gp|U96108|SCU96108__7 Staphylococcus carnosus (3R)-hydrox 
ymyristoyl acyl carrier protein dehydrase homoiog (fabZ) gen 

e, partial cds, YwpF homoiog, single-strand binding protein 
homoiog (ssb) , SceD precursor (sceD) , SceA precursor (sceA) 
15 and SceE precursor (sceE) genes, complete cds, and TenA homo 
log (tenA) gene, partial cds . NID: g2735509. 

atggaagatgttaaatttctagtagaacaaattgagtttatgttagaaggagaagttgaa 
gcgcatgaggttctagctgatttcattaatgaaccatatgaagaaatagtaaaagaaaaa 
gtatggccaccaagtggtgatcattatattaaacatatgtacttcaatgcatttgcacgt 

20 gaaaatgcagccttcacgattgcagcgatggcaccctgtccatacgtctacgctgtcatt 
ggtaaacgtgcgatggaagatcccaaattaaataaagaatcagtgacttctaaatggttt 
caattttatagtactgaaatggacgaacttgttgatgtgttcgatcaattgatggaccgt 
ttaactaaacattgtagtgagacagaaaaaaaagagattaaagaaaatttcttgcaaagt 
actattcatgagagacatttcttcaatatggcatatattaatgaaaaatgggaatatggg 

25 ggaaataacaatgaataa 

Sequence 2528 

MEDVKFLVEQIEFMLEGEVEAHEVLADFINEPYEEIVKEKVWPPSGDHYIKHMYFNAFAR 
ENAAFTIAAMAPCPYVYAVIGKRAMEDPKLNKESVTSEWFQFYSTEMDELVDVFDQLMDR 
30 LTKHCSETEKKEIKENFLQSTIHERHFFNMAYINEKWEYGGNNNE* 

Sequence 2529 

Cont igO 7 68_po s_4 63 9_4 079, 

is similar to (with p-value 8.0e~41> 

35 >sp:sp| P4 4 697fTHID_HAEIN PHOSPHOMETHYLPYRIMIDINE KINASE (EC 
2.7.4.7) (HMP- PHOSPHATE KINASE) (HMP-P KINASE) . >pir:pir|l64 
151II64151 hypothetical protein HI0416 - Haemophilus influen 
zae (strain Rd KW20) >gp : gp I U32725 } U32725_3 Haemophilus infl 
uenzae Rd section 40 of 163 of the complete genome. NID: gl5 

40 73387. 

atggagttgattcaaagttacttaaaggaacattccaatattccttatgtgattgatcca 
gttatgcttgccaaaagtggtgattcattgatggatgaaaatactaaaaatcatttgcaa 
tc»\ac.at tattacctttggctgatgttgtcactccgaatatacctgaagctgaggaaatt 
actggtac.taaaat taatgatgaagaaagcatacgtaaagcaggtcaaatctttactaat 

45 gaaattggtagtaagggtgttgtaattaagggagggcattcagccgatttaaataatgct 
aaagattttctttttactaagaatgaaacgtatacctttgagaacaaacgctttgatact 
aaacatactcatggaactggttgtactttttcagcagttattacagcagaattagctaaa 
ggtcgttccataaaagatgcagttaaaaaagcaaaagagtttatttcattaagtattgaa 
cataccccagaaattggcaaggggagaggacctgtaaatcattttgcttatatgaagaaa 

50 gtaggtttagatgatgaataa 

Sequence 2530 

MELIQSYLKEHSNIPYVIDPVMLAKSGDSLMDENTKNHLQSTLLPLADVVTPNIPEAEEI 
TGIKINDEESIRKAGQIFINEIGSKGVVIKGGHSADLNNAKDFLFTKNETYTFENKRFDT 
55 KHTHGTGCTFSAVITAELAKGRSIKDAVKKAKEFISLSIEHTPEIGKGRGPVNHFAYMKK 
VGLDDE* 

Sequence 2531 

Cont i g_0 7 6 9_po s_1300_1833. 
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is similar to (with p-value l.Oe-69) 

>sp:splP37 948|GLPT_BACSU GLYCEROL-3- PHOSPHATE TRANSPORTER (G 
-3-P TRANSPORTER) (G-3-P PERMEASE). >pir : pir I S37250 | S37250 g 
lycerol-3-phosphate transport protein - Bacillus subtilis >g 
5 p:gp|Z26522|BSGLPTQ 1 B.subtiiis glpT and glpQ genes for gly 
ceroi 3-phosphate permease and glycerophosphoryl diester pho 
sphodiesterase. NID: g403371. >gp: gp 1 Z99105 | BSUB0002_4 3 Baci 
llus subtilis complete genome (section 2 of 21): from 194651 
to 415810. NID: g2632457. >gp: gp| AB006424 I AB006424_4 4 Bacil 
10 lus subtilis genomic DNA, 70 kb region between 17 and 23 deg 
ree. NID: g3599592. 

atgtttaatttcttaaaaccagctagacatatcaaatcattgccatcagaaaaagtagat 
gatacgtataaaagactacgctttcaagtctttttaggaatatttatcggttatgctggt 
tactatttattaagaaagaacttctctttagcaatgccttcattaattgagcaaggcttt 

15 ag'. asagaggaattaggtattgcattatctgcagtatctatcgcatatggctttaacaaa 
tttgtac.tgggcactgtcagcgatcgaagtaatgctcggatgttcttaactttaggttta 
gtattgacagcaattattaacttattattaggatttattccattctttacttcaagcata 
actattatgtttatcatgctgtttttagttggatggttccaaggaatgggctggccacca 
tctggacgtgtgttagttcattggtttagtgtcagcgaacgtggaagcaaaacgtcaata 

20 tggaatgtagcacataatgtaggcggaggtttaatggcacctattgctacgtag 

Sequence 2532 

MFNFLKPARHIKSLPSEKVDDTYKRLRFQVFLGIFIGYAGYYLLRKNFSLAMPSLIEQGF 
SKGELGIALSAVSIAYGFSKFVMGTVSDRSNARMFLTLGLVLTAIINLLLGFIPFFTSSI 
25 TIMFIMLFLVGWFQGMGWPPSGRVLVHWFSVSERGSKTSIWNVAHNVGGGLMAPIAT* 

Sequence 2533 

Con t ig_0 7 6 9_pos_l 8 4 3_2 658, 

is similar to (with p-value 2.0e-72) 

30 >sp:sp| P37 94 8 |GLPT_BACSU GLYCEROL-3-PHOSPHATE TRANSPORTER (G 
-3-P TRANSPORTER) (G-3-'P PERMEASE). >pir : pir I S37250 I S37250 g 
lycerol-3-phosphate transport protein - Bacillus subtilis >g 
p:gp|Z26522|BSGLPTQ_l B.subtiiis glpT and glpQ genes for gly 
cerol 3-phosphate permease and glycerophosphoryl diester pho 

35 sphodiesterase. NID: g403371. >gp : gp | Z99105 | BSUB0002_43 Baci 
llus subtilis complete genome (section 2 of 21) : from 194651 
to 415810. NID: g2632457. >gp:gp| AB006424 | AB006424_4 4 Bacil 
lus subtilis genomic DNA> 70 kb region between 17 and 23 deg 
ree. NID: g3599592. 

40 atgactgcattatacaacttcggttatttaaaagggtttgaaggtgtctttatataccct 
gcactattggttatcattattgccatcttgtcttacatactaattagagatacaccacaa 
tctcagggtttaccaccaattgagcagtataaaaatgattatgccacttcaactaaacaa 
acaattgaaacagaactaactacgaaagaaatattatttaaatatgtacttaataacaaa 
tgggtatgggcgattgcttttacaaacattttcgtttattttgtgcgttatggtgttttg 

45 gactgggctccgacatacttaagtgaggaaaagcattttgatttaagtgcttcaggttgg 
get tact tcttatacgaatgggcaggaattccgggcacgctactctgtggttatctatct 
gacaaactatttaaaggtcgccgtggtccagcaggcttcttctttatgttaggcgtaaca 
atctttatccttatatattggttaaatccaccaggtcacgcatggttagataatctttca 
ttaattggtattggtttcttaatttatggtccagtcatgttaattggtttacaagcgtta 

50 gattatgttcctaaaaaagcagctggcactgcggctggtctaactggtctatttggatac 
ttattcggtgcagttatggctaatattgtattagggtttgtagttcaacattttggatgg 
catattggctttgtgttattaacagtcatcagcatactcgctatgttatgtttcatttta 
acttggaataaacgtggtcaagaacaaatcgactag 

55 Sequenofi 2534 

MTALYNFGYLKGFEGVFIYPALLVIIIAILSYILIRDTPQSQGLPFIEQYKNDYATSTKQ 
TIETELTTKEILFKYVLNNKWVWAIAFTNIFVYFVRYGVLDWAPTYLSEEKHFDLSASGW 
AYFLYEWAGIPGTLLCGYLSDKLFKGRRGPAGFFFMLGVTIFILIYWLNPPGHAWLDNLS 
LIGIGFLIYGPVMLIGLQALDYVPKECAAGTAAGLTGLFGYLFGAVMANIVLGFVVQHFGW 
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HIGFVLLTVISr L AMLC FI LTWN KRGQEQ ID* 
Sequence 2535 

Con t i g_0 7 6 9_pos_4 7 9 9_5 7 1 6 , 
5 putative peptide of unknown function 

atggttatttcaaagaaattagcatttatcactataggtatcttaatatgtattctttta 
agttcacctctgatcgatcaacagatttcaaattatttcatgaatcaagattcaattttc 
ggtactttatttcaaaactacggtttatttccacctacgctGatattaattatttcaact 
gtaattttaaattattatattttaactacatttcaaaataaattagcaaaaacgt taact 

10 ttgattttatcttttatctttacgttgataaaaaccaatgaattcgtgtcagaaaccgct 
caatatatgttatctacatcagaaaacatcaaaaatcataagccaatgggtatggcaaat 
aatgaaggaaacgctggaaatgctttatctttaggaatgagtttttttatttctttgatt 
atcattataattatcactattatttgttatcagttttggttaaaacatacaaataaccaa 
gaacttgatcatttattcaaagtttcacttattagctttatgattttatttattggatta 

15 gaactcgttgatagcttgaaacatttatggggacgttttagaccatatgaaatcactgac 
aaagctggacatttcactcactggcttacaataaatggaaacactggacatagttccttc 
ccttcaggacacactggaaatggtgcatttctaatgtttatcgcattttactttaaaaaa 
ttacgcactcaaaaaatagtgtttagtatcggattatgttatagtattttaatggcattg 
agcagaattcgtattggtgcacatttcactagtgatgtaacaatgagcttgcttattatg 

20 ttttcactcatggttattgcagattttatcattaatcaagtcatatctatacataaaaac 
aagctaagtaaagtctag 

Sequence 2536 

MVISKKLAFITIGILICILLSSPLIDQQISNYFMNQDSIFGTLFQNYGLFPPTLILIIST 
25 VILNYYILTTFQNKLAKTLTLILSFI FTLIKTNEFVSETAQYMLSTSENI KNHKPMGMAN 
NEGNAGNALSLGMSFFISLIIIIIITIICYQFWLKHTNNQELDHLFKVSLISFMILFIGL 
ELVDSLKHLWGRFRPYEITDKAGHFTHWLTINGNTGHSSFPSGHTGNGAFLMFIA.ifYFKK 
LRTQKIVFSIGLCYSILMALSRIRIGAHFTSDVTMSLLIMFSLMVIADFIINQVISIHKN 
KLSKV* 

30 

Sequence 2537 

Con t i g_0 7 6 9_pos_7 8 6 6_7 003, 

putative peptide of unknown function 

atgccgttaatcttaggtatcgaaaaaggttggtttaaagaagaatcattagaaattgag 
35 atgattgaaccaaaggaacactttgacgcactagacgagattgaaaatggttcaatggat 
atcgcgattactgaacctattcatctggttcaagatagagctaaagaacaaaaagtcatc 
gggtttgcgagatatcttcacactaatggaggtattatgtataacaaagataaaaatatc 
gctcgccccaaagatttaatcggtaaacggcttcaatatcccggtgctcccggtccaggt 
ggtattgctatggctaaaacgatgattgaagctgatggtggtacatttgaagaaggtgac 
40 atcacaccagttaatcatagtttttatcatactgatgcacttttaactgataaagctgat 
gctgctacactcattttcgaaaattttgaaattcttgaagctagaaatcaaggacttaat 
gtagattattttccacttaaaaattataatgtacccgatttttgtcaactcattttcatt 
acaacacctgaagtattaaatacgagtgaggttaaactcaaaaagttcctaaaaatcatt 
caaaaaacaattcattacatcaatcatcatttagaaagtgcaattgaaatttactcaacg 
45 tatactcaaaccgatatctctaaccaattaaacaaagatacaattgaagcaacagctaaa 
tgttttacaaatgatttgtctatgagttccgactactacaatgatttacagttgtggctt 
aaagaagtcaatgatattaaagacacgattaacccaactttatattttacaaatcaacta 
ctattcagtcgatatacaaaataa 

50 Sequence 2538 

MPLILGIEKGWFKEESLEIEMIEPKEHFDALDEIENGSMDIAITEPIHLVQDRAKEQKVI 
GFARYLHTNGGIMYNKDKNIARPKDLIGKRLQYPGAPGPGGIAMAKTMIEADGGTFEEGD 
ITPVNHSFYHTDALLTDKADAATLIFENFEILEARNQGLNVDYFPLKNYNVPDFCQLIFI 
TTPEVLNTSEVKLKKFLKIIQKTIHYINHHLESAIEIYSTYTQTDISNQLNKDTIEATAK 

55 CFTNDLSMSSDYYNDLQLWLKEVNDIKDTINPTLYFTNQLLFSRYTK* 

Sequence 2539 

Contig_0769_pos_6570_6241, 

putative peptide of unknown function 
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atgggtttgaagagaaattggaggttgagaaagatgtatgtagagagaaagcctgcactt 
tatatcgaagatttacgagacgaattcaaaaatagcttaaaacattttaaagatgatgat 
gaagcatttaatacattagttggatttgtggaactggaccacctatattcttcagcactt 
aaagaaataagtacaaaattaagtattttagatgataactttaactatcaatataaacac 
5 aatccaattcatcatatggaacgacgagtaaaagagatgcacagtttagttaaaaagttg 
aatcgtaaaggatttgaagtaagtgcataa 

Sequence 2540 

MGLKRNWRLRKMYVERKPALYIEDLRDEFKNSLKHFKDDDEAFNTLVGFVELDHLYSSAL 
10 KEISTKLSILDDNFNYQYKHNPIHHMERRVKEMHSLVKKLNRKGFEVSA* 

Sequence 2541 

Con t i g_0 7 6 9_pos_4 4 9 9_3 618, 

putative peptide of unknown function 

15 atgtatatgattaagcaaaattttacaactaaagtttatccacttaaacaatttgaggga 
gaatatagtgaagttgcctacgcaggagatagccaagctgatgatgtgattgtatttatt 
catggtgcacttttaacatataaaattatgacgatgttcgaaccttacttcagagai:tac 
aaat caat.atttattaattgtccaagtcgtggtagaagttcagatttagatcgtgacaca 
catacattagatgattacgctgcacgtatatatgatgtattaacgcaaattgttaaggag 

20 caacaaataaaagaactgagtattgtcggttattcaatgggaggaatgattgcgacacgg 
ttacttaagtataatacattaccagtctctcatcttatttatttacatagcgcagcgaaa 
attactccagatgcaagtatgttagcacgattattcactagtgagagtaagagagcagtg 
ttaaaagatgaaattaaggcagtgaaaaatcttcctcaatatatactagataaaacgatt 
tatgcacaaaaggaaaacgcacttgatttggtacaatttattgcacctattaaaactata 

25 attacggatatgatttacacgattaatacagattatttaccagatatcgatgagattaaa 
caatt tccgaaaatattatttatgtctggaaaagaagatcaaattattccttatacggat 
tctcaagctacgttagaaaagtttaaggcgttaggtggagaaactaaagaagttatttat 
ccaggaattggtcatatcgatttcccaagtgttttagaaacccaatcggatggacaaact 
ggcgtggtagatgaaattaaagcgtggatttcaaaaaaataa 

30 

Sequence 2542 

MYMIKQNFTTKVYPLKQFEGEYSEVAYAGDSQADDVIVFIHGALLTYKIMTMFEPYFRDY 
KLIFINCPSRGRSSDLDRDTHTLDDYAARIYDVLTQIVKEQQIKELSIVGYSMGGMIATR 
LLKYNTLPVSHLIYLHSAAKITPDASMLARLFTSESKRAVLKDEIKAVKNLPQYILDKTI 
35 YAQKENALDLVQFIAPIKTIITDMIYTINTDYLPDXDEIKQFPKILFMSGKEDQIIPYTD 
SQATLEKFKALGGETKEVIYPGIGHIDFPSVLETQSDGQTGVVDEIKAWISKK* 

Sequence 254 3 

Cont ig_07 69_pos_3 3 94_2 8 1 6 , 

40 putative peptide of unknown function 

gtggagaaacttgaactgattttactacttaaggaaataggttgcacaattaaagaaatt 
aaagtgttattaaaagatgattcatcaatgaaatctctttacacgtttcttcaaatcaaa 
aagcatgaaatacaacagtcaataacagataaagaaaacaaggtatgtaaaatagagcaa 
attcaacgctatgttcatcaaaattcaatttctccaattcattatttacaagatatagct 

45 atttatatggaagaatctcatagactcaaaggtgtcagaaagaaattatggttaagtata 
gctctgattggttcattacaatatggtggtttgatcacctcaattgtaactcaaaggaaa 
aagcctttcttgtcgatgatgcctgtggtagctatgtattcactttggttaacgaaaaag 
tataaaaagaatgtttcatatgtctgtcctaattgccaccatgtgtttaatcctagtgtc 
attcattttgtaacagcatcacatacacctaaaacaagaaagcttcaatgtcctgattgt 

50 cacgaaatgcattactgtgtagaaattgctaagctatga 

Sequence 254 4 

VEKLELILLLKEIGCTIKEIKVLLKDDSSMKSLYTFLQIKKHEIQQSITDKENKVCKIEQ 
IQRYVHQNSISPIHYLQDIAIYMEESHRLKGVRKKLWLSIALIGSLQYGGLITSIVTQRK 
55 KPFLSMMPVVAMYSLWLTKKYKKNVSYVCPNCHHVFNPSVIHFVTASHTPKTRKLQCPDC 
HEMHYCVEIAKL* 

Sequence 2545 

Con t i g_0 7 6 9_pos_8 5 0_2 5 1 , 
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is similar to {with p-value l.Oe-51) 

>sp:sp|P10585|GNTR_BACSCJ GLUCONATE OPERON TRANSCRIPTIONAL RE 
PRESSOR (P28 PROTEIN). >pir : pir I C26190 ( C26190 transcript iona 
1 repressor GntR - Bacillus subtilis >gp:gpl AB005554 I AB00555 
5 4_4 Bacillus subtilis genomic DNA, 36 kb region between gnt 
and iol operons. NID: g2280496. >gp:gp I J02584 I BACGNT_1 B.sub 
tilis (gluconate operon) gntR, gntK and gntP genes encoding 
gnt repressor, gluconate kinase and permease, and gntz gene. 

NTD: gl43013. >gp : gp | Z99124 | BSUB0021_110 Bacillus subt.r.is 
10 comp.leve genome (section 21 of 21): from 3999281 to 421-814. 

NID: g2636442. 

gtgagtggtgagattgaatcccatatacaactaagtgaaaatcaagttgcacaacaattt 
aatgttagtcgttcaccagtcagagatgcatttaagttacttcaaacagatcaactgatt 
caattagaacgtatgggcgctcaggtacttccttttggtgatcaagagaaaaaggaaatg 

15 tatgatttgcgtttgatgctcgaatcattcgccttttcaaaattaagtggaacagataca 
caacatattattaaggaaatgaaaaagcaattagaaatgatgaaggttgcagtccaattt 
gaggatgctgaagcatttacacaacatgattttgagtttcatgaggtgatgattcaagca 
acaaaccatcagtatcttaaagtgttttggaatcaccttaaacctgttatggaatcactc 
atactcatttcaatgagacaaagaatggcaaatgaccccaaagatttcgagagaattcat 

20 aaaaatcatcaagtttttatagatgctgttgagaacgatgatgcctccatattgagaaaa 
gcattccatttaaattttgatgatgtaggagaaaatattgaagcattctggttacgttaa 



Sequence 254 6 

25 VSGEIESHIQLSENQVAQQFNVSRSPVRDAFKLLQTDQLIQLERMGAQVLPFGDQEKKEM 
YDLRLMLESFAFSKLSGTDTQHIIKEMKKQLEMMKVAVQFEDAEAFTQHDFEFHEVMIQA 
TNHQYLKVFWNHLKPVMESLILISMRQRMANDPKDFERIHKNHQVFIDAVENDDASILRK 
AFHLNFDDVGENIEAFWLR* 

30 Sequence 254 7 

Contig_0770_pos_3341_3688, 

is similar to (with p-value 3.0e-41) 

>sp:sp|P37 527 I YAAD_BACSU 31.6 KD GUANYLYLATED PROTEIN IN DAC 
A-SERS INTERGENIC REGION. >gp : gp | D26185 I BAC180K_75 B. subtil 

35 is DNA, 180 kilobase region of replication origin. NID: g467 
326. >gp:gp|Z99104|BSUB0001_ll Bacillus subtilis complete ge 
nome (section 1 of 21): from 1 to 213080. NID: g2632267. 
atgtctaaaatagtaggatcagatcgagttaaaagaggaatggctgaaatgcaaaaaggc 
ggtgtcattatggacgtcgttaatgcagaacaagctaaaattgctgaagaagccggagct 

40 gttgccgtaatggcattagagcgtgtaccatcagatattcgtgctgctggcggtgttgca 
cgtatggcgaatcctaaaatagttgaagaagttatgaatgccgtatcaattccggttatg 
gctaaagccagaattggtcatattacagaagctagagttttagaatcgatgggtacacgg 
tttcaagaacccaacatactacaaacgaatttcaaaaggcgagagtaa 

45 Sequence 2548 

MSKI VGSDRVKRGMAEMQKGGV IMDVVNAEQAKI AEEAGAVAVMALERVPS DI RAAGG VA 
RMANPKIVEEVMNAVSIPVMAKARIGHITEARVLESMGTRFQEPNILQTNFKRRE* 

Sequi»ncB 254 9 
50 Contig_0770_pos_5637_5939, 

is similar to (with p-value 3.0e-42) 

>gp:gpl AF016634 I AF016634_1 Lactococcus lactis cremoris CipB 
chaperone homolog (clpB) and phosphor ibosylformylglycinamide 

cycio-ligase (pur5) genes, complete cds; and phosphoribosyl 
55 glycinamide formyltransf erase (pur3) gene, partial cds. NID: 

g3150045. 

atgattcgaccttcagattcttttacttctttcaatactgcttttaaacgttcttcaaat 
tcacctctatattttgcacctgcaactaaggcacttaaatcaagctcgaaaatcgtttta 
tcgagtaatgattctggaacgtctttacgtacaattcgttgtgctaaaccttcaacaatt 
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gcagctttacctacacctggttcaccgattaaaaccggattattttttgtttttccactt 
aatatacgaattgtattacgtatttcttcatctctaccgatgacagggtccattttacct 
tga 

5 Sequence 2550 

MIRPSDSFTSFNTAFKRSSNSPLYFAPATKALKSSSKIVLSSNDSGTSLRTIRCAKPSTI 
AVLPTPGSPIKTGLFFVFRLNIRIVLRISSSLPMTGSILP* 

Sequence 2551 
10 Contig_0770_pos_4797_3853, 

is similar to (with p-value O.Oe+00) 

>gp:gp| AF016634 I AF016634_1 Lactococcus lactis cremoris ClpB 
chaperone homolog (clpB) and phosphoribosylforraylglycinamide 

cyclo-ligase (pur 5) genes, complete cds; and phosphor ibosyl 
15 glycinamide f ormyltransferase (pur3) gene, partial cds. NID: 

g3150045. 

gtggaaacagaaagagaaaagctattaagtttaagtgacatcttacacaaacgtgtagta 
ggtcaagataaagcagttgatttagtatcagacgcagtagttagagcacgtgctggaatt 
aaagatccgaatagaccaatcggaagtttcttattcttaggacctactggagtaggtaaa 

20 actgaattagcaaaatcgcttgcttcatcacttttcgattctgaaaaacatatgattaga 
attgAtatgagcgaatatatggaaaaacatgctgtatcacgtttaattggtgcacctcca 
ggttatgtaggtcacgatgaaggtggtcaattaactgaagcagttagacgtaatccatac 
tcagttattttgttagacgaagttgaaaaagcacatagcgatgtttttaatgtattactt 
caaatactagatgaaggtcgtcttacggattctaaaggtagaagtgttgactttaaaaat 

25 accattatcatcatgactagtaatattggttcacaagtattacttgaaaatgtaaaagat 
gctggtgaaattagtgatgatacagagaaagcagttatggacagtctacatgcatacttc 
aaacctgaaatattaaatcgtatggatgacatcgtgttatttaaaccattatcagttaat 
gatatgagtatgattgtagataaaattttaacacaattaaatatgagattattagatcaa 
catatctcaattgaagtgacagaagaagcgaaaaaatggctaggtgaagaagcgtatgaa 

30 ccacaatttggtgcaagaccattaaaacgctttgttcaacgacaaatagaaactccaatt 
gcacgtatgatgattaaagaaagtctacctgaaggtacaataattaaagtagatttaaat 
gacaataaagaacttgattttaaggttgttaaacctacgtcttaa 

Sequence 2552 

35 VETEREKLLSLSDILHKRVVGQDECAVDLVSDAWRARAGIKDPNRPIGSFLFLGPTGVGK 
TELAKSLASSLFDSEKHMIRIDMSEYMEKHAVSRLIGAPPGYVGHDEGGQLTEAVRRNPY 
SVILLDEVEKAHSDVFNVLLQILDEGRLTDSKGRSVDFECNTIIIMTSNIGSQVLLENVKD 
AGEISDDTEKAVMDSLHAYFKPEILNRMDDIVLFKPLSVNDMSMIVDKILTQLNMRLLDQ 
HISIEVTEEAKKWLGEEAYEPQFGARPLKRFVQRQIETPIARMMIKESLPEGTIIKVDLN 

40 DNK^LOFKVVKPTS * 

Sequence 2553 

Con t i g_0 7 7 0_po si 7 6 2 3 8 3 , 

is similar to (with p-value O.Oe+00) 

45 >sp:sp|P39616| DHA2_BACSU PROBABLE ALDEHYDE DEHYDROGENASE YWD 
H (EC 1.2.1.3). >pir:pir |S39713jS39713 hypothetical protein 
- Bacillus subtilis >gp: gp I X73124 ( BSGENR_59 B.subtilis genom 
ic region (325 to 333). NID: g413923. >gp: gp I Z99123 | BSUB0020 
_91 Bacillus subtilis complete genome (section 20 of 21): fr 

50 om 3798401 to 4010550. NID: g2636240. 

atgacaataattagagataaatttaacaatagtaaagctttttttaatacgcataaaaca 
aaaaaccttaaat t tcgaaaacaacaacttaaattactaagtaaaaatatcaaaaatcat 
gaaaatgaattattagatgccttatataaagatttaggtaaaagtaaggttgaagcatac 
gcaactgaaattggtatgcttttgaaaagcataaagctaatgcgcaaagagttaaaaaat 

55 tggtcgaaaaccaaacaaacggatacaccactctacttattccctacaaagagttatatt 
aaaaaagaaccttacggtacggtgcttattataggaccatttaattatccggttcaatta 
gttttcgagcctctcatcggagcaatagctgccggaaatactgctatagttaaaccttca 
gagttaacacctcatgttgccattgtgatcaaggacatcattgaagatacatttgatgaa 
gcatacgtttctgttgttgaaggtggtattgaagaaacccaaacgttattaagtctscca 
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tttgattatatgttctttactggcagtgaaaaagtcggaaaaattgtctatgaagctgca 
gcaagaaaattaattccagttactcttgaacttggcggtaaatcacctgtcattgtcgat 
gatacagccaatatcaaagtagcgagtgaacgtattagttttggtaaatttactaatgct 
ggtcaaacatgtgtcgctccagattatatattagttcagcggaaagttaaaaatgattta 
5 ataaaagctcttaaaaaaacaattactgaattttacggagaaaatattgaaaaaagccct 
gatttcggacggattgttaatcaaaaacactttaatcggttgaatgacttgattcaaatt 
cataaagataatgttgtttttggaggtaatagttctaaagaagatttatatattgaacct 
actttattggataacataaccaatgacaataaaatcatgaaagaagaaatattcggtccc 
attttgcctattattacttatgataatttcgatgaagtacttgaaatcatccaaagtaaa 
10 tcaaaaccactaagtttgtatctttttagcgaagatgaaaacatgacacatagagtggtt 
gaagaattatcatttgggggcggtgcaattaacgatacgttaatgcatttagctaatcct 
aacttacctttcggtggtgtaggttcttcaggcataggtcaatatcatggtaagtattct 
tttgatacatttagtcatatgaaatcatacacatttaaatctacacgtctagaatcgagt 
ttatttttccctccatataaaggtaaatttaaatatattaaaaccttcttcaagaactag 

15 

Sequence 2554 

MTIIRDKFNNSKAFFNTHKTKNLKFRKQQLKLLSKNIKNHENELLDALYKDLGKSKVEAY 
ATEIGMLLKSIKLMRKELKNWSKTKQTDTPLYLFPTKSYIKKEPYGTVLIIGPFNYPVQL 

20 VFEPLIGAIAAGNTAIVKPSELTPHVAIVIKDIIEDTFDEAYVSVVEGGIEETQTLLSLP 
FDYMFFTGSEKVGKI V YEAAARKLI PVTLELGGKS PVI VDDTAN I KVASER I S FGKFTNA 
GQTCVAPDYILVQRKVKNDLIKALKKTITEFYGENIEKSPDFGRIVNQKHFNRLNDLIQI 
HKDNVVFGGNSSKEDLYIEPTLLDNITNDNKIMKEEIFGPILPIITYDNFDEVLEIIQSK 
SKPLSLYLFSEDENMTHRVVEELSFGGGAINDTLMHLANPNLPFGGVGSSGIGQYHGKYS 

25 FDTFSHMKSYTFKSTRLESSLFFPPYKGKFKYIKTFFKN* 

Sequence 2555 

Contig_0771_pos_3650_4 303, 
is similar to {with p-value 7.0e-69) 
30 >gp:gpiAF008219|AF008219_3 Borrelia afzelii R-IP3 chromosome 
right end, arcA and arcB genes, complete cds. NID: g26971H 

atgaaaaatttacgtaacagaagctttttaactttattagacttttcacgacaagaggta 
gaatttttattaacactctccgaagatttgaagcgtgccaaatatatcggcactgaaaag 

35 cctatgctaaaaaataaaaatatcgcgcttctttttgaaaaagattccactagaacacgt 
tgcgcattcgaagttgccgcacatgatcaaggtgcacacgtcacttatcttggacctaca 
ggttctcaaatgggtaaaaaagaaactgctaaagatacagcacgtgtacttggtggtatg 
tatgatggtattgagtaccgaggtttctctcaacgtactgtagaaacattagcgcaatat 
tcaggtgttccggtatggaatggattaaccgatgaagatcaccctacacaagtgcttgct 

40 gactttttaactgctaaagaagtattgaaaaaagagtatgctgatatcaactttacttat 
gttggcgatggacgtaacaatgttgctaacgcattaatgcaaggtgctgccattatgggt 
atgaatttccatcttgtttgtcctaaagaactcaatccgacagaagaattattaaatcgt 
tgcgaacgtattgcgacggaaaatggcggtaacattttaatatcattactttaa 

45 Sequence 2556 

MKWLRNRSFLTLLDFSRQEVEFLLTLSEDLKRAKYIGTEKPMLKNKNIALLFEKDSTRTR 
CAFEVAAHDQGAHVTYLGPTGSQMGKKETAKDTARVLGGMYDGIEYRGFSQRTVETLAQY 
SGVPVWNGLTDEDHPTQVLADFLTAKEVLKKEYADINFTYVGDGRNNVANALMQGAAIMG 
MNFHLVCPKELNPTEELLNRCERIATENGGNILISLL* 

50 

Sequence 2557 

Con t J gO 7 7 l_po s_5 3 8 3_4 3 1 0 , 

putative peptide of unknown function 

atgattaattcatttggaatggttgtaggtaatttgttaggtggttttttgtttgataaa 
55 cttggtggttacaaaacgattttaataggtacatttacgtgtttatgtagtaccacatta 
ctcaacttgtttcatggctggccatggtatgcaatttggttagtactacttggatttggt 
ggcggaatgatagttcctgctatttatgcgatggcaggtgccgtttggcctaatggagga 
agacaaacttttaatgctatctacttagcacaaaatataggggttgctttaggcgcggca 
ttaggaggttttgtggctgaatttagtttcaattatatttttatggcaaacctcattatg 



626 



wo 01/34809 



PCT/USOO/30782 



tatgtactttttgccatcgtggcgattacacagttcaatttagagattaatgctaaattt 
aaaccacaagattcgatagatttaaaaagcaaagaaaataaaaaacgatttactgctatg 
atgctagtatgtgcaatgtttgcaatttgttggattgcatatattcaatgggaaacaacg 
atagcttcattcacacaatcaattaatatttcaatgtctcaatatagtgtattat'jgaca 
5 attaatggaattatgattttagtagctcaacctttaataagaccaattattatctcatta 
aaaggtaatttaaaacatcaaatgtttgtaggtattttaatttttatgagttctttccta 
gtgacgagttttgcaaatcactttgctatatttgtagttggcatggtcattttaactttt 
ggagaaatgtttgtttggcctgcagtaccaactatagcaaatcaacttgcaccagttgga 
aagcaagggcaataccaaggatttgttaattcagcatctacagtgggtaaagcatttgga 
10 ccatttattgggggtatacttgtagatacatttaatatgagtatgatgtttattgggatg 
attatattattaagttttgcactgttatttttaagtttctatgataaagtgttacccaag 
aattttaaaaatcaacatcaatcaagaagacgacgaaatcagaatggtatttaa 

Sequence 2558 

15 MINSFGMVVGNLLGGFLFDKLGGYKTILIGTFTCLCSTTLLNLFHGWPWYAIWLVLLGFG 
GGMIVPAIYAMAGAVWPNGGRQTFNAIYLAQNIGVALGAALGGFVAEFSFNYIFMANLIM 
YVLFAIVAITQFNLEINAKFKPQDSIDLKSKENKKRFTAMMLVCAMFAICWIAYIQWETT 
lASFTQSINISMSQYSVLWTINGIMILVAQPLIRPIIILLKGNLKHQMFVGILIFMSSFL 
VTSFANHFAIFVVGMVILTFGEMFVWPAVPTIANQLAPVGKQGQYQGFVNSASTVGKAFG 

20 PFIGGILVDTFNMSMMFIGMIILLSFALLFLSFYDKVLPKNFKNQHQSRRRRNQNGI* 

Sequence 2559 
Contig_0771_pos_284 4_2416, 
is similar to (with p-value 2.0e-50) 
25 >pir:pir|S58181|S58181 fofB protein - Staphylococcus sp. >gp 
: gp 1X89875 I SSPPDNAFB_1 Staphylococcus sp. plasmidic DNA for 
fosB gene. NID: g927563. 

atggaaataacaaatgttaatcatatttgtttttcagtgagtgatttaaatacctctata 
caattttataaagatattttacatggtgacttattagtatcagatagaacgacagcatat 

30 ttaactattggtcatacttggattgcactgaatctagaaaaaaatataccaaggaatgaa 
ataagtcattcctatacgcacgttgctttctccatagatgaagaagattttcaacagtgg 
attcaatggcttaaagagaatcaagtaaatattttaaaagggcgaccaagagacattaaa 
gacaaaaaatcgatatattttacagatctggatgggcataaaattgaattacatactgga 
acattaaaagatagaatggaatattataaatgtgagaagacgcatatgcaattttacgat 

35 gagttttga 

Sequence 2560 

MEITNVNHICFSVSDLNTSIQFYKDILHGDLLVSDRTTAYLTIGHTWIALNLEKNIPRNE 
ISHSYTHVAFSIDEEDFQQWIQWLKENQVNILKGRPRDIKDKKSIYFTDLDGHKIELHTG 
40 TLKDRMEYYKCEKTHMQFYDEF* 

Sequence 2561 

Con t Ig 0 7 7 l__pos_l 1 7 6_1 7 8 , 

is similar to {with p-value O.Oe+OO) 
45 >sp:sp| P53557 IBIOBBACSO BIOTIN SYNTHETASE (EC 2.8.1.-). >gp 

:gp| AF008220 I AF0p8220_77 Bacillus subtilis rrnB-dnaB genomic 
region. NID: g2293135. >gp : gp | U51868 | BSU51868_5 Bacillus su 

btilis biotin biosynthetic operon genes, complete and partia 

1 cds. NID: gl277024. >gp: gp | 299119 1 BSUBOOl 6_93 Bacillus sub 
50 tills complete genome (section 16 of 21) : from 2997771 to 32 

13410. NID: g2635411. 

atgctaatttttaagaaaaaggagttaaagattatgacattaaacctagctcaacgtgtg 
ttaaatcaagagtcattaacaaaagatgaagcaatatctattttcgaaaatgctgaaatt 
gatacatttgat ttattaaatgaagcctacacagtgagaaaacattactatggtaaaaaa 
55 gttaagcttaatatgatattaaatgctaaaagtggtatctgtgcagaaaattgtgggtac 
tgtgggcaatctgtaaaaatgaaagaaaagcaacgttatgcacttgttgaacaggaccaa 
attaaagaaggcgctcaagtggcaactgaaaatcaaatcggtacatactgtattgttatg 
agtggtagaggtcctagtaacagagaagtcgatcatatttgcgaaacagtagaagatatt 
aaaaagatacacccacaactaaagatttgtgcgtgcttaggattaacgaaagaagaacag 
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gctaaaaaattaaaggctgctggtgtcgatcgttataatcataatttaaatacgagtgag 
cgttatcacgatgaagtagtaactacacatacatatgaggatagagtgaatacggttgaa 
atgatgaaagataataatatttctccttgttcaggtgtgatatgtggtatgggagagtcg 
aatgaggacattattgatatggcatttgctttaagagccatcgatgctgatagcattcct 
5 attaattttttacatcctattaaaggaactaaatttggtggattagatttattgtcacca 
atgaaatgtttaagaattatagcgatgtttaggttaatcaatccaacaaaagaaattcga 
attgcaggtggacgggaggtaaatctacgttcattacaaccactcgcattgaaagcggct 
aattcaatttttgtaggagattacttaattacaggcggtcaaccgaatgaggaagattat 
cgcatgattgaagatttagggtttgaaatcgacagttaa 

10 

Sequence 2562 

MLIFKKKELKIMTLNLAQRVLNQESLTKDEAISIFENAEIDTFDLLNEAYTVRKHYYGKK 
VKLNMILNAKSGICAENCGYCGQSVKMKEKQRYALVEQDQIKEGAQVATENQIGTYCIVM 
SGRGPSNREVDHICETVEDIKKIHPQLKICACLGLTKEEQAKKLKAAGVDRYNHNLNTSE 

15 RYIiD^VVrTHTYEDRVNTVEMMKDNNISPCSGVICGMGESNEDIIDMAFALRAIDADSIP 
INFLHPIKGTKFGGLDLLSPMKCLRIIAMFRLINPTKEIRIAGGREVNLRSLQPLALKAA 
NSIFVGDYLITGGQPNEEDYRMIEDLGFEIDS* 

Sequence 2563 
20 Cont ig_077 4_pos_4 3 6 8 7 3 , 

is similar to (with p-value 3.0e-34) 

>sp:sp|P3914 9|UPP_BACSU URACIL PHOSPHORIBOSYLTRANSFERASE (EC 
2.4.2.9) (UMP PYROPHOSPHORYLASE) (UPRTASE) . >pir : pir I S4 9364 

IS49364 uracil phosphoribosyl transferase - Bacillus subtilis 
25 >gp:gpj Z38002|BSSPORUPP_10 B. subtilis spoII-R, glyC and upp 
genes. NID: g556877. >gp:gp| 299122 1 BSUB0019_186 Bacillus su 

btilis complete genome (section 19 of 21) : from 3597091 to 3 

809700. NID: g2636029. 

atgttaaaagtattcaacactgaatcaagcatctctctatgttgggcggtgactacaaca 
30 acgggttccaggtcagaatctt tctctaacgttttaatcaacggagccatttttatagct 
tcaggcctagttccaaatatggtcataacttttttcatcaaaaatatctatctccgaatc 
tattatttagtaccgaataatctatcccctgcatcacctaaacctggtgtaatatacgct 
ttgtcatttaatttttcatctaatgccgcaatatatatatctacatctgggtgtgcttct 
tgcattttttcaacgccttcaggggcagctattaaacacataaaacgtatacttttagct 
35 ccacgtttttttaatgaagaaattgcttcaatagctgaagcaccagtagcaagcatagga 
tcaaccacaataatttga 

Sequence 2564 

MLECVFNTESS I SLCWAVTTTTGSRSESFSNVLINGAI FX ASGLVPNMVITFFI KNI YLRI 
40 YYLVPNNLSPASPKPGVIYALSFNFSSNAAIYISTSGCASCIFSTPSGAAIKHIKRILLA 
PRFFNEEIASIAEAPVASIGSTTII* 

Sequence 2565 

Contig_077 4_pos_24 23_3718, 
45 is similar to (with p-value 9.0e-29) 

>pir :pir I S57509 I S57509 integral membrane protein - Streptomy 
ces pristinaespiralis >gp:gp 1X84072 I SPDNAPTR_1 S.pristinaesp 
iralis ptr gene. NID: g872305. 

gtgccattacaatcatcatacaatagtgatattggtactattaatatagcagttagctta 
50 tcggcactattttctggtctgtttattgtaggtgcaggagatattgcagataaaattggt 
agagtaaaagtgacgtacataggcttagcacttaatattgttgggtcgattt taatcatt 
attacgccattaccaagtctattgattattggacgtgctattcaaggattgtcagcggca 
tgtataatgccagcgacactcgcaatcattaatgaatattatatcgggacagcacgacaa 
cgtgcattaagctactggtccatcggttcatggggaggtagtggtgtttgtactttgttt 
55 ggtgGtttaatggcaactaaccttggatggcgctcaatctttattgtttcaattar.tctg 
acaatattatccatgtttctcattaaacatacacccgaaacaaaagcagagcctatagga 
gatcaaccgacagagacaaagaaatttgatgttgttggtttaatcatcttagtggttagt 
atgttaagtattaatgtgataataactcaaacctctcaatttggtttgttctcaccattt 
attttgggacttattgcaatttttgttatatcgttaattatattcgtgatttacgaaaat 
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aaaatcaaacaaccacttgtcgattttgatatctttaaaaacaaaggttatacaggtgca 
acgatttcaaactt tat gttgaatggtgtagctggtggtacattgattgtagtaaa tact 
ttttatcagcaaaaattagattttaactctcaggaaacaggatatatttcacttacatac 
ctaattgcagtattaattatgatacgtgtgggtgaaatgatattacaatcgttaggacct 
5 aaaagacctttgttactcggaagtgccttgaccgtcataggattaatattattatctttg 
acgtttttacctaatgcttggtatatagcgtcaagtgtcattggttatttattatttggt 
accggtttaggtgtttatgcaacaccatccacggatacagctgttgcacaagcacr agat 
gataaagtaggcgtcgcatccggtgtatataagatggcatcatccttaggaaatgcattt 
ggtgtggccatctcaagtacagtttacagcgtacttgcagcccaacttaatctgacttta 
10 ggtggttttactggagtaatgtttaatgcgcttatagcattattagcattcctttctatt 
ttgttcttaataccgaaaaaacagtctaatgtataa 

Sequence 2566 

VPLQSSYNSDIGTINIAVSLSALFSGLFIVGAGDIADKIGRVKVTYIGLALNIVGSILII 
15 ITPLPSLLIIGRAIQGLSAACrMPATLAIINEYYIGTARQRALSYWSIGSWGGSGVCTLF 
GGLMATNLGWRSIFIVSIILTILSMFLIKHTPETKAEPIGDQPTETKKFDVVGLIILWS 
MLSINVIITQTSQFGLFSPFILGLIAIFVISLIIFVIYENKIKQPLVDFDIFKNKGYTGA 
TISNFMLNGVAGGTLIVVNTFYQQKLDFNSQETGYISLTYLIAVLIMIRVGEMILQSLGP 
KRPLLLGSALTVIGLILLSLTFLPNAWYIASSVIGYLLFGTGLGVYATPSTDTAVAQAPD 
20 DKVGVASGVYKMASSLGNAFGVAISSTVYSVLAAQLNLTLGGFTGVMFNALIALLAFLSI 
LFLIPKKQSNV* 

Sequence 2567 

Contig_07 7 4_pos_4133_5074, 

25 putative peptide of unknown function 

atggaatatatgaaaatagcaattgcaggttctggcgcattaggtagtggatttgqtgct 
aagttgtttcaacatggttatgacgtcactttaattgataattgggaacctcaagLtact 
acaatacaacaggacggtctacatatcgatattaatggtgaagcgcatcatttcaggcta 
cctatgtatagactaacggaaattcctaaagcaacgtcctatgatattgtttttctattt 

30 cctaaatctatgcaattagaagaggtgcttagtcatattcaaccccatcttcatgataat 
acaattgttgtgtgcactatgaatggtttgaaacatgaacgtcttatacaacaatatgtt 
tctatagatagaattgtacgtggagtaacaacgtggactgccggtattgatcaacctggt 
cacacgcacttaatggggcaaggtcctgttgaaattgggtgtctcgtgcccgagggaaaa 
gaaagcgtagatatcattgttaatctgctacaaaatgcagaattaaaaggtgtaaaaagt 

35 gaacatttacatcaatcaatttggaagaaaatatgtgttaatggaacagctaattcatta 
tgtactatacttgaatgtaatttggcagcactgaataatagtgatgacgctaaaaatttg 
atatataaaattacacaagaaattgttcatgttgcaacagttgatgatgttcatcttaat 
gttgatgagatttttgattacttaattgctttaaatgataaagtaggcccacactatcct 
tctatgtaccaagacttaattaaagataatcgaacaactgaaatagattatattaatgga 

40 gcagttagtaaattagggaaagagaatcatattgctacacctgtaaatgattttgtaaca 
aatcttgtacatgctaaagaaaatcaacgtggtgcacaatga 

Sequence 2568 

MEYMKIAIAGSGALGSGFGAKLFQHGYDVTLIDNWEPQVTTIQQDGLHIDINGEAHHFRL 
45 PMYRLTEIPKATSYDIVFLFPKSMQLEEVLSHIQPHLHDNTIWCTMNGLKHERLIQQYV 
SIDRlVRGVTTWTAGIDQPGHTHLMGQGPVEIGCLVPEGKESVDIIVNLLQNAELKGVKS 
EHLHQSIWKKICVNGTANSLCTILECNLAALNNSDDAKNLIYKITQEIVHVATVDDVHLN 
VDEIFDYLIALNDKVGPHYPSMYQDIilKDNRTTEIDYINGAVSKLGKENHIATPVNDFVT 
NLVHAKENQRGAQ* 

50 

Sequence 2569 

Contig_0774_pos__2605_2297, 

putative peptide of unknown function 

atgattaaaatcgacccaacaatattaagtgctaagcctatgtacgtcacttttactcta 
55 ccaattttatctgcaatatctcctgcacctacaataaacagaccagaaaatagtgccgat 
aagctaactgctatattaatagtaccaatatcactattgtatgatgattgtaatggcacg 
actagatttacaagtgattgtgcaaataaccaaaatgtaataacccctaaaataatccct 
aataataagcgattatctcccctaaacttttgtgatgtattcatctatcacttactcctt 
cacaattaa 
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Sequence 2570 

MIKIDPTILSAKPMYVTFTLPILSAISPAPTINRPENSADKLTAIIilVPISLLYDDCNGT 
TRFTSDCANNQNVITPKIIPNNKRLSPLNFCDVFIYHLLLHN* 

5 

Sequence 2571 

Cont ig_07 7 4_pos_0_5 97 , 

is similar to (with p-value 3.0e-65) 

>gp:gp| U81973ISAU81973_16 Staphylococcus aureus capsule gene 
10 cluster Cap5A through Cap5P genes, complete cds. NID: gl773 
339. 

atgaaaaaagttatgaccatatttggaactaggcctgaagctataaaaatggctccgttg 
attaaaacgttagagaaagattctgacctggaacccgttgttgtagtcaccgcccaacat 
agagagatgcttgattcagtgttgaatacttttaacataagtgcagattatgatttgaat 

15 attatgaaagctggtcaaacattgtctgaagtaacatctgaagcaatgaaaaagttagaa 
gatatcatacaaaaggaagtgcctgatatggtacttgttcatggtgatacagtgacaacc 
ttttctggagcattagccgcattttatagtcaaacacctataggacatgttgaagci:gga 
ttaaggagttataataaatattcaccttatcctgaagaaataaatagacaaatggt tggg 
gtaatggcagatttgcactttgccccaacctataatgctgcacagaatttagtaaaagag 

20 ggtaaattagccaaacatatagctatcactggtaatacagctattgacgcaatgaattat 
acaatcgatcaccaatattcatcatctatcatacaaaaacataaaaataaaaaTACT 

Sequence 2572 

MKKVMTIFGTRPEAIKMAPLIKTLEKDSDLEPVVVVTAQHREMLDSVLNTFNISADYDLN 
25 IMKAGQTLSEVTSEAMKKLEDIIQKEVPDMVLVHGDTVTTFSGALAAFYSQTPIGHVEAG 
LRSYNKYSPYPEEINRQMVGVMADLHFAPTYNAAQNLVKEGKLAKHIAITGNTAIDAMNY 
T I DHQYSS S I IQKHKNKNT 

Sequence 2573 
30 Con t ig_0 7 7 5_pos_5 1 5 6_0 , 

putative peptide of unknown function 

gtgtttatagctgctgaacatcaaacgtcagttgatgagttagagtatatttatgttcca 
cctttaactacagaagctcaaaatgcctttgacgaacttaagcatgtgatgccaaaagta 
tgttataaagccaatttagaacgtttggcatcgttgccaaatataaataacactgaccat 

35 aatcctaatgctgaagcgcatcgtcacgctagcgattggagtgaagttcgtccagaatgg 
ggtctagcacgaaatgctgaattcattattgggaaacgtcaaatcacccaaaatagtaat 
ctagagggacgggcatttcttcataattatgattggacaaaggatgaagacggtgagatt 
ttaaatacaattatttctgggccagcactagtagcacaatggattaatttacaat r.ctac 
gcctcaaccgtggcacctcactattatggaagcggtagtaaaacaacgcaaactgtaaca 

40 agtggtgtaggtgtcatgcaaggaaatgctagtgatttaatgtatggcttaccatggcag 
tcagtaatgatgaatgacaaagaggcgtatcacgcacctattaggcttttaattgttatt 
caagcgccagatgcatatattcaacgtttgttaaaacatcataatcactttagacaaaag 
gttgatcatcaatggataagacttgccagtattgatgaaaataatagttggaaagactgg 
ta 

45 

Sequence 2574 

VFIAAEHQTSVDELEYIYVPPLTTEAQNAFDELKHVMPKVCYKANLERLASLPNINNTDH 
NPNAEAHRHASDWSEVRPEWGLARNAEFIIGKRQITQNSNLEGRAFLHNYDWTKDEDGEI 
LNTIISGPALVAQWINLQYYASTVAPHYYGSGSKTTQTVTSGVGVMQGNASDLMYGLPWQ 
50 S VMMNDKEAYH AP I RLLI VIQAPDA Y I QRL LKH HN H FRQKVDHQW I RLAS I DENNSWKDW 

X 

Sequence 2575 
Contig_0775_pos_3674_2475, 
55 is similar to (with p-value 2.0e-92) 

>sp:sp|P4 4 953|DFP_HAEIN DNA/ PANTOTHENATE METABOLISM FLAVOPRO 
TEIN HOMOLOG. >pir : pi r j G64 104 | G64 104 pantothenate metabdism 
flavoprotein (dfp) homolog - Haemophilus influenzae (scrain 
Rd KW20) >gp:gp|U3277 6IU32776_9 Haemophilus influenzae Rd s 
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ection 91 of 163 of the complete genome. NID: gl573969. 
atgaaacatattttattagctgttacaggcggtatcgcagcatataaagcaattgattta 
acaagtaaattaatacaatccggctatgatgtaagagttatgctatctgatcatgctcaa 
gagtttgttactccgctagcttttcaagcaatcagtagaaatcctgtttacacaaataca 
5 tttaaagaagaaaatcctgaagagattcaacatgtatcattaggagactgggcagatgcg 
attatagtcgcgccagcaactgctaatactatcgcaaaattaagtgttggaattgctgat 
gat ttaat tact tctacattacttgctacaacaacaccaaaattcgttgcacccgcaatg 
aatgtaaatatgtataacaatccacgtactaaacataatatgaaagtgctaagtcaagac 
ggctattattttattgaacctggtagtggctatttagcatgtggttatgtagcaaauggg 

10 cgaatggaagaacccatgcaaatcctatctgttattaataaattttttactcaacjigaag 
aatgttgtcaaaagctctttttctggaaagcgcgcattagttacagctgggcctacagtt 
gaagttattgatcctgttcgatacgtatcaaatcgttcatcaggaaaaatgggatatgct 
atagctgaagcattacgagataagggagcaatcgtaactttaattagtggtcccacccac 
ttatctctacctgaagggattaatgtagtaaaagttgagagtgcagatgatatgtttcaa 

15 gctgtaaccgaacgctttgcgaaacaagatatagtgattaaagcagcggcggtgtctgat 
tatacaccaatggacatacttgaacataaattaaaaaaacaagaaggaggattatctgtt 
caatttaagcgtacaaaggatattttaaaatacttaggagaaaataaaacgcaccaatat 
cttgttggttttgctgctgaaacacaaaatattgaacagtatgctctagacaaactcaaa 
agaaaaaatgcagatgttatcatttcgaacaatgtaggtgatacatccataggctttagt 

20 tcagatgacaatgaattaactatgcattttaaaaataatgaaaaagtaaatattaagaaa 
ggaaaaaaatcagctttagcacatcaaattatagaaattttagaaactaggtggcagtaa 



Sequence 2576 

25 MKHILLAVTGGIAAYKAIDLTSKLIQSGYDVRVMLSDHAQEFVTPLAFQAISRNPVYTNT 
FKEENPEEIQHVSLGDWADAIIVAPATANTIAKLSVGIADDLITSTLLATTTPKFVAPAM 
NVNMYNNPRTKHNMKVLSQDGYYFIEPGSGYLACGYVAKGRMEEPMQILSVINKFFTQQK 
NFWKSSFSGKRALVTAGPTVEVIDPVRYVSNRSSGKMGYAIAEALRDKGAIVTLISGPTH 
LSI.Pn:3inVVKVESADDMFQAVTERFAKQDIVIKAAAVSDYTPMDILEHKLKKQEG3LSV 

30 QFKRTKDILKYLGENKTHQYLVGFAAETQNIEQYALDKLKRKNADVIISNNVGDT3IGFS 
S D DNELTMH FKNNEKVN I KKGKKS ALAHQ II E I LET RWQ * 

Sequence 2577 
Contig_0775_pos_2367_67, 
35 is similar to (with p-value O.Oe+00) 

>sp:sp|P944 61|PRIA_BACSU PRIMOSOMAL PROTEIN N' (REPLICATION 
FACTOR Y) (FRAGMENT). >gp : gp | Y10304 | BSPRIADFS_1 B.subtilis p 
rlA, def, fmt, sun genes. NID: gl772497. 

gtggtaccttttggccctagaactatacaaggatacgtcatgaatattcaacaaaaacca 

40 gatggaaatatggatatatcgaaactaaaagaaataaaagaagtacgtgatattaaacct 
gaattaacatccgaactgattcaattaagcgaatggatgagccattatcatgtgatgaaa 
cgtatttctgttttagaagcgatgttgccaagtgccattaaagcaaagtataagaaagct 
ttttcaattatcgatccaaaaaatttatcttcaaaaaccaaagcgctatttaacaatgac 
ggttattacttatataaagaagttcagcaaaacaatgatttagaagaaatgttgactttg 

45 ttaaatcaaggattgattgaagaggtcacgatactttctcaaaacacaaaaaagaaaact 
caaaaagctgttggcgtagttaatacgttgaatggtgatgaagtacttgcaaaactcgag 
aaatatacaaaacaatatgatttgtatgcatttttattagaagagtctcatcgaacagtg 
tttttaaaagaaatcaatgatatgggcttctctcactcgagtttagattctttaatcaaa 
aaaggct'?»tattgaaaaatatatcgccgaggttttcagagatccatatgcaaatc«:tata 

50 tttgaacaagaacaaaagaggatattaactaaagaacagcaagatgcatttgaagctatt 
caacattatattcatgatgaaaaagaaagaacatttttattacacggagtcacaggttca 
ggtaaaaccgaagtctatcttcaaacaatagaagaagttcttaataaaggtaaagaagcc 
atgatgttagtgccggaaatcgccttaacacctcagatggtactaagatttaaacgtcga 
tttggagatgatgtagcggtattacattccggactttcaaaaggtgaacgttatgatgag 

55 tggcaaaaaattagagacggtcgagctcgagtgagtgtaggtgctcgttcaagtattttc 
gcaccgtttaaaaatttaggcatcattataattgatgaagaacatgaatctacatataaa 
caagaagattatcccagatatcatgcacgtgatattgcacaatggagaagtcaatttcat 
cattgtcctgtagttttaggtagcgcaacaccgagtcttgagtcatatgcaagagcagaa 
aaaaatgtttacgagttgttgtcattgccacatagagtcaatcaacaagcgttaccgcat 
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atcgHtci\tatagatatgagagaagaattaagtgaaggtaatcgttccatgttttctata 
gcactaagacaagcgatacaagaacggttggataaaaaagaacaaatagtactattctta 
aatagaaggggatatgcttcatttatgttatgtagagattgtggttacgttccccaatgt 
ccccattgtgatatttcgttaacatatcataaaacaaccgatcaattaaaatgtcattac 
5 tgtggttatcaagaaaatccaccatctcaatgtccaaattgtgaaggtgatcatatcaga 
caagtcggaactggaacgcaacgtgtagaagaattattacaacaagaattccctcatgct 
cgtattataaggatggatgttgatacaacttcaagaaaaggtgcacatgagaaattgtta 
aatgactttgaagcaggaaaaggagatatcttattaggtacgcaaatgattgctaaaggt 
ttggattatcctaacattactctagttggtgtgctcaatgctgacactatgttaaactta 

10 cctgactttcgtgccagtgaacgaacataccaacttttaactcaggtatctggacgcgca 
ggtcgtcatgaaaaagaaggacaagttatcatacaaacgtacaaccctga teat tat tea 
ataaaggatgtgaaattaaaegattatctttetttttatcaaaaagaaatgaattatcga 
aaattaggaaaatatccaccttactattttttgattaactttaccatttcacatactgat 
ataaaaaaggtcatgatggcctctaagcatatacatcaaattttagtacagcacttaagt 

15 gaaaaagcattcgtgctaggcccttcaccagcagcactagcaagaattaacaatgagtat 
cgttttcaaatactagtaaaatataagagtgagcctcaattacatcaagcgttacaatat 
ttagatgattataatcatgatcaatatgtaaaggataaactatcattaaaaattgatatc 
aatccacaaatgatgatgtga 

20 Sequence 2578 

■ VVPFGPRTIQGYVMNIQQKPDGNMDISKLKEIKEVRDIKPELTSELIQLSEWMSHYHVMK 
RISVLET^LPSAIKAKYKKAFSriDPKNLSSKTKALFNNDGYYLYKEVQQNNDLEEMLTL 
LNQGLIEEVTILSQNTKKKTQKAVGWNTLNGDEVLAKLEKYTKQYDLYAFLLEESHRTV 
FLKEINDMGFSHSSLDSLIKKGYIEKYIAEVFRDPYT^RIFEQEQKRILTKEQQDAFEAI 

25 QHYIHDEKERTFLLHGVTGSGKTEVYLQTIEEVLNKGKEAMMLVPEIALTPQMVLRFKRR 
FGDDVAVLHSGLSKGERYDEWQKIRDGRARVSVGARSSI FAPFKNLGI I I IDEEHESTYK 
QEDYPRYHARDIAQWRSQFHHCPWLGSATPSLESYARAEKNVYELLSLPHRVNQQALPH 
IDIIDMREELSEGNRSMFSIALRQAIQERLDKKEQIVLFLNRRGYASFMLCRDCGYVPQC 
PHCDISLTYHKTTDQLKCHYCGYQENPPSQCPNCEGDHIRQVGTGTQRVEELLQQEFPHA 

30 RIIRMDVDTTSRKGAHEKLLNDFEAGKGDILLGTQMIAKGLDYPNITLVGVLNADTMLNL 
PDFRASERTYQLLTQVSGRAGRHEKEGQVIIQTYNPDHYSIKDVKLNDYLSFYQKEMNYR 
KLGKYPPYYFLINFTISHTDIKKVMMASKHIHQILVQHLSEKAFVLGPSPAALARINNEY 
RFQILVKYKSEPQLHQALQYLDDYNHDQYVKDKLSLKIDINPQMMM* 

35 Sequence 2579 

Contig_0776_pos_1350_2054 , 

is similar to (with p-value 2.0e-50) 

>Sp:sp| P54163I YPDP_BACSU HYPOTHETICAL 25.7 KD PROTEIN IN BCS 
A-DEGR INTERGENIC REGION. >gp : gp | L7724 6 | BACYACA_10 Bacillus 
40 suhtilis (YAClO-9 clone) DNA region between the serA and kdg 
loci. WID: gl256615. >gp : gp | Z99115 I BSUB0012_140 Bacillos su 
btilis complete genome (section 12 of 21) : from 2195541 to 2 
409220. NID: g2634478. 

atgtataatgaaatatttggtattgcgtcatttattgttacattcgctttaatggtactg 
45 atgtatcgctgttttggtaaacaaggactaattgcttgggtagcaataggaacgattatc 
gctaatatacaggtcataaaagcggttcatatttttggtattacggctaeacttggaaat 
gtcatgtttgcttctatatatttagctactgatatattaaatgacatctatggtcgtaaa 
gttgctaaaagagcggtgtggcttggtttctcttctaccttagtaatgattatagtcatg 
caaatgtcattgcattttattcctgccccagtagacaatgcgcaaaactcattaaaaatg 
50 atttttgatttagtgcctagaattgctataggttccattattgcttatatcataggccaa 
catattgatgtatttatattcagtatgattaaaaagatatttagctctgataagaccttt 
tttattagaqcatatggtagtaccattttaagttctatcattgataccggtttatttgtt 
tcaattgcttttattggtactatgcctggtactgctgtttttgaaatatttattaccact 
tacttgttaaaactagtgtcaactatttttaa'tgtaecatttggatatatcgctaagtca 
55 ctatatcgaaaaggaaagatagaacaactagataatgggtattga 

Sequence 2580 

MYNEIFGIASFIVTFALMVLMYRCFGKQGLIAWVAIGTIIANIQVIKAVHIFGITATLGN 
VMFASIYLATDILNDIYGRKVAKRAVWLGFSSTLVMIIVMQMSLHFIPAPVDNAQNr»LKM 
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IFDLVPRIAIGSIIAYIIGQHIDVFIFSMIKKIFSSDKTFFIRAYGSTILSSIIDTGLFV 
SIAFIGTMPGTAVFEIFITTYLLKLVSTIFNVPFGYIAKSLYRKGKIEQLDNGY* 

Sequence 2581 
5 Con t i g_0 7 7 6_pos_2 3 7 9_2 771, 

is similar to (with p-value 3.0e-19) 

>sp:sp|P36921 |EBSB_ENTFA CELL WALL ENZYME EBSB. >pir:pir|B49 
939IB49939 ebsB protein - Enterococcus faecalis >gp:gp|L2380 
2 1ENEEBSA_2 Enterococcus faecalis pore forming, cell wall en 
10 zyme, regulatory, and dehydroquinase homologue proteins (ebs 
A, ebsB, ebsC, and ebsD) genes, complete cds with repeat region 
. NID: g388106. 

atggctaaaatacattttgatgctgcgactaaaggaaatcccggccgaagtgcttgtgcg 
attattattaaagaaaattcacaaagatatacatttacccatgatttaggtgaaatggat 
15 aatcatagtgcagaatgggcagcaatgttacacgctttggaacatgcacgcgaattaaaa 
g tat ctaacgcgt tact ttttactgattcaaaattaattgaagatagtatgatgcaaggt 
aaagttaaaaatgctaagtttaaagtttattttgaaaacatagaaatcttagagcaaagt 
tttgatttgatgtttgtgagatggattccacgaaagcaaaataaagaagcgaatcaactt 
gctcaacaaacactatacaaacttacatcataa 



20 



25 



Sequence 2582 

MAKIHFDAATKGNPGRSACAIIIKENSQRYTFTHDLGEMDNHSAEWAAMLHALEHARELK 
VSNALLFTDSKLIEDSMMQGKVKNAKFKVYFENIEILEQSFDLMFVRWIPRKQNKEANQL 
AQQTLYKLTS* 



Sequence 2583 

Cont i g_0 7 7 6_pos_5 5 4 5_4 95 5 , 

is similar to (with p-value 3.0e-22) 

>gp: gp I AL034447 I SC7A1_23 Streptomyces coelicolor cosmid 7A1. 

30 NID: g4007715. 

gtgttcgttatggaaaaagaagaaatctttcttcctgaaagtgaaaaaaatgggaaacct 
atcgacttattttttatatggtttgcagccaatctaggtattctcgggatagtttatggc 
gctgtcatagtaagttacggattgagctttttacaatcaattttgattgctataatagga 
ccgttatcctttgttctagttggttatataagtgtagctggtagagatagtggagctata 

35 acttttatgctctcaagagcaccatttggatttaaaggcaatcatatacctgctttaatt 
ggctgggtaggtcaagttggttggttatctgttaatgtttctacaggaactttaactctt 
ctggctttattcaatacttttggttttaagactagtacatttctaattttgatgagttta 
gcgatttttgctgggctagttattatatctgttcttttttcacaaaaagtacttgtatca 
gtacaaacatttttcacatatgtatttggtgcattaaccttattagttataacaatttta 

40 attactaa tact gat tggaacgcccttttttctatgaaatctgggtcttaa 

Sequence 2584 

VFVMEKEEIFLPESEKNGKPIDLFFIWFAANLGILGIVYGAVIVSYGLSFLQSILIAIIG 
PLSFVLVGYISVAGRDSGAITFMLSRAPFGFKGNHIPALIGWVGQVGWLSVNVSTGTLTL 
45 LALFNTFGFKTSTFLILMSLAIFAGLVIISVLFSQKVLVSVQTFFTYVFGALTLLVITIL 
ITNTDWNALFSMKSGS* 



Sequence 258 5 

Cont i g_0 7 7 6_pos_4 7 8 4 _2 94 9 , 

50 putative peptide of unknown function 

gtr!ataaaagagttatcacaaaagaaacgagatgcaataaataacaacactgatt^aaca 
ccttctcdaaaggcacatgctttagcagatattgataaaacagaaaaagatgcaci tea a 
catatcgaaaattctaattcaattgatgatatcaataacaataaagagcatgcatttaat 
actttagctcatatcattatttgggatactgatcagcaaccattagtttttgaactacct 

55 gaattgagccttcaaaatgctctagtaacaagtgaggtggttgttcacagagatgaaact 
atttcattagaatctataattggagctatgactttaactgatgaacttaaagtcaatatt 
gtttcattaccgaacactgataaagtagctgatcacctaaccgctaaagttaaggttatt 
ttagctgatggctcatttgtcactgtaaatgttccagtcaaagttgtagaaaaagaatta 
caaatagctaaaaaggatgctataaaaacaattgatgttctggtaaaacaaaaaatcaaa 
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gatatagattctaataacgaattaacgtctactcaacgtgaagatgcaaaagctgaaatt 
gaaagattgaaaaagcaagccatcgataaagtgactcattctaaatcgattaaagatatt 
gaaacagtaaaacgaactgattttgaagaaatagatcagtttgatcctaaacgctt^iacg 
ctaac.taaagctaaaaaggatatcattactgatgttaatactcaaatccaaaatg^ tttc 
5 aaagaaattgaaacaataaaaggtttaacttctaatgaaaaaactcagtttgataaacaa 
ttaactgcactacaaaaagaatttttagaaaaagtcgagcatgctcataatttagtagaa 
ttaaatcaattacaacaagagtttaataatagatatgaacatattttaaaccaagcacat 
ttactaggtgaaaaacatatagcagaacataaattaggatatgttgtagtaaacaaaact 
cagcaaatactaaataatcaatctgcttcttactttataaaacaatgggcacttgataga 

10 attaaacaaattcaactagaaacgatgaattcaattcgtggtgcgcataccgtacaagat 
gtacacaaagcattgttacaaggtatagagcaaatcttgaaagtaaatgtaagtattata 
aatcaatctttcaacgattccttgcataactttaattatcttcattcaaaatttgatgct 
agattaagagaaaaggatgttgcaaaccatatcgtacaaactgaaacattcaaagaagtt 
ctaaaaggaacgggtgttgaaccaggtaaaatcaacaaagaaacacagcaaccaaaactt 

15 cataagaatgataatgatagcctattcaaacatttagttgataatttcggcaaaactgta 
ggtgttattacattaactggtttactttctagtttctggttagttttggctaaaagacgt 
aaaaaagaagaagaaaaacaatcgataaaaaattatcacaaagacattcgtctttcagat 
actgataaaatagatccaattgtaataactaagcgtaaaatagataaagaagaacaaatt 
caaaacgatgacaaacattcaattccagttgctaaaca taagaaatctaaagaaaagcaa 

20 ttgagtgaaggagatattcattcaatccccgtcgttaaacgtaaacaaaacagtgataac 
aaagatacaaaacagaagaaagttacttctaaaaagaagaaaacgcctcaatcaactaaa 
aaagttgtaaaaaccaaaaagcgttctaaaaagtaa 

Sequence 2586 

25 VIKELSQKKRDAINNNTDLTPSQKAHALADIDKTEKDALQHIENSNSIDDINNNKEHAFN 
TLAHIIIWDTDQQPLVFELPELSLQNALVTSEVVVHRDETISLESIIGAMTLTDELKVNI 
VSLPNTDKVADHLTAKVKVILADGSFVTVNVPVKVVEKELQIAKKDAIKTIDVLVKQKIK 
DIDSNNELTSTQREDAKAEIERLKKQAIDKVTHSKSIKDIETVKRTDFEEIDQFDPKRFT 
LNKAKKDIITDVNTQIQNGFKEIETIKGLTSNEKTQFDKQLTALQKEFLEKVEHAHNLVE 

30 LNQLQQEFNNRYEHILNQAHLLGEKHIAEHKLGYWVNKTQQILNNQSASYFIKQWALDR 
I KQI QLETMNS I RGAHT VQDVHKALLQG I EQI LKVNVS 1 1 NQS FNDSLHNFN YLHSKFDA 
RLREKDVANHIVQTETFKEVLKGTGVEPGKINKETQQPKLHKNDNDSLFKHLVDNFGKTV 
GVITLTGLLSSFWLVLAKRRKKEEEKQSIKNYHKDIRLSDTDKIDPIVITKRKIDKEEQI 
QNDDKHSIPVAKHKKSKEKQLSEGDIHSIPVVKRKQNSDNKDTKQKKVTSKKKKTPQSTK 

35 KVVKTKKRSKK* 

Sequence 2587 

Contig_0776_pos_587_147, 

is similar to (with p-value 9.0e-29) 

40 >sp:sp|P54170| YPHP_BACSU HYPOTHETICAL 15.9 KD PROTEIN IN ILV 
D-THYB INTERGENIC REGION. >gp: gp | L7724 6 I BACYACA_21 Bacillus 
subtilis {YAClO-9 clone) DNA region between the serA and kdg 
loci. e^^ID: gl256615. >gp:gp| Z99115 1 BSUB0012_128 Bacilliia su 
btilis complete genome (section 12 of 21) : from 2195541 to 2 

45 409220. NID: g2634478. 

atgatgaatggatacgaagcttatatgaaagaacttgcacaacaaatgagagctgaatta 
acagacaatggattcacaagtcttgaaacgagtgatgacgtcaatcagtatatgcaaaat 
atagataatgatgatacaacatttgttgtaatcaactcaacatgcggttgcgctgcagga 
ttagcacgtccagcagctgttgcagttgcagagcaaaatgaagtgaaaccagatcataaa 

50 gtaactgtatttgctggtcaagataaagaagcaacacaaacaatgagagattacatccaa 
caagttccttcaagtccctcatacgcattatttaaaggtcaacatttagttcattttata 
cctcgcgaacatattgaagggcgcgacatcaatgatatagctatggatttaaaagatgct 
tttgatgataattgtcaataa 

55 Sequence 2588 

MMNGYEAYMKELAQQMRAELTDNGFTSLETSDDVNQYMQNIDNDDTTFVVINSTCGCAAG 
LARPAAVAVAEQNEVKPDHKVTVFAGQDKEATQTMRDYIQQVPSSPSYALFKGQHLVHFI 
PREHIEGRDINDIAMDLKDAFDDNCQ* 
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Sequence 2589 

Contig_0777_pos_367_684 , 

putative peptide of unknown function 

atgaggagaatatatatgaaaagattattaggtacattaattgctgctacactagtgtta 
5 agtgcttgtagccaaaacgacactaaggaagatgaaaataaaaagtcagaaaatactact 
gaaaagaaatctgacgataaaaaagdcaaaaaaactaatgaggataaaaagtctggagaa 
caaaagaaatctcaagaaaaaaagaataacaagtcaatgcaagaatctgctacaaatgaa 
caggttcaatctcaacaacaaacttcacaaggtgtaccatcaaaatgctctttgccgatg 
actgttttatcaatttga 

10 

Sequence 2590 

MRRIYMKRLLGTLIAATLVLSACSQNDTKEDENKKSENTTEKKSDDKKDKKTNEDKKSGE 
QKKSQEKKNNKSMQESATNEQVQSQQQTSQGVPSKCSLPMTVLSI* 

15 Sequn?nr;fi 2591 

Contig_0777_pos_6354_6046, 

putative peptide of unknown function 

atgactgcatttttaattcttaatttattgactatgaaagaggcttcaatatcgtctatt 
attgtacgaactgtcattgcagctattgttttctttgtcatttatatcattgtatttaca 
20 attttaagttcgtcagaacgtaaacttatttatggtacaactttgcctattgcgcttttt 
atatgccttatattcggagcaatttttttcactccgcgtataggtatcattgccggacta 
attataggtgtgtttgctggtgtcatatgggagttcttaaatagaaaaaatggaggtcgc 
tcatcttga 

25 Sequence 2592 

MTAFLILNLLTMKEASISSIIVRTVIAAIVFFVIYIIVFTILSSSERKLIYGTTLPIALF 
ICLIFGAIFFTPRIGIIAGLIIGVFAGVIWEFLNRKNGGRSS* 

Sequence 2593 
30 Contig_0777_pos_4 678_1709, 

is similar to (with p-value O.Oe+00) 

>sp:sp|P4 08l6jT3RE_SALTY TYPE III RESTRICTION-MODIFICATION S 
YSTEM STYLTI ENZYME RES (EC 3.1.21.5). >pir : pir IJN0658 IJN065 
8 restriction endonuclease (EC 3.1.-.-) - Salmonella typhimu 
35 rium 

atgaaaatattcttagaagaactaattcaccaacaacaagcagttaaaaaaataatagat 
actttcacaggaatcgaaaagtatttaacttcaaaaaattgtgacaatgagttcgctaat 
aatttaataataaatagatat tcagaagaagctaacatagatataaaaatggagaccggg 
acaggtaaaacgtatgtttatactaaaatgatgtatgaattacataaaaaatttgggatt 

40 tttaaatttatattagtagttccaagtcctgcaattaaagagggagcaaagaatttttta 
actagtttatcaactaaaagacatttccaagaaacatacggaaatgttgaaatagaaata 
aatacaataaataaaggcgattttaatactagatcaggcagaaaaatttttccacctcat 
ttgagtaactttattgaaagcagtaatttaaatgctaatcaaattcaagtattacttatt 
aatgcaggtatgttaaattcatcaaatatgacaaaggttgattacgaccaaactttatta 

45 agtaattactcaaaccctatagaagctttaaaagctaccaaatctgtagttataatagac 
gaaccccataggtttcctagggataagaagaactacaaatctatagaaaatcttgaacct 
caaatgattgttaggtttggtgctacttttcctgaagtaaaaaaaggaacaggaaaaaaa 
gctgtatatattaaagattattatcgaggtaggcctcaatttgaattgaatgctgttgac 
agctttaatcagggacttgtaaagggaatagatatttactatccaaatttgactccagaa 

50 caagcgaaaaatagatataccattgatagtgtaaaagccaaagagatagttttgaaaaaa 
ggaaaaaataaatggacgctaggtataggcgagaatttagcaaatatagattcgttattt 
gaaggtgacttaagttattctggtgctaaaactttatctaatgacttagaaattagtaaa 
ggtatggatttattgcctggcacatttacaactaattaccaggaattaattattaatgat 
gctatcaatcaacattttgagcatgaaatcaataattttatgagagataacctaaaagaa 

55 aattttcaaccaaaagtaaaaacactatcattattttttat tgattctattaggtcatat 
agaaacaaagagggctggttgaaacagacttttgagagactcttaaaagttaaacttaga 
aaa ttaataaaagaatttgaagttaaaaaattaccacgagagattgaatacttagatttt 
ttaagagtgacatatgatagtcttaatagtgagaatcaaatggtccatgcagggtatttt 
ggtgaagacagaggttccggtgatgaagcaatacaggctgaagttaatgatatacttaaa 
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aataaagaaaaaatgctgagttttaaagacgagaatggtaactggattacaagaagattt 
ttattttcaaaatggacattaagagaaggttgggataaccctaatgttttcgttattgcg 
aaattgagaacttctggttctgaaaatagcaagatacaggaagtaggaagaggtttgcgt 
ttacctgttgatgagaatggtcacagattaacacaagatgaatttccaagtagactttca 
5 tttttaattggatatgatgaaaaggatttcgcagaaaaattaattggagaaataaattct 
gatgtagacgttaaattaagtgaagataaattaacagatgaaatgattaataaaatagtt 
gaacatagaaagcaagtggatcctcattataatgatgaagtgttattagaacaattagat 
gaacgaaatctaattaataggaaaaatgagtttaaaacagatgttgaaattgatggagtt 
aaaaaatcaggatttgaatggctgttagaattatatccagaagttaatgcatctaaatta 

10 aatgcagataaagtcagagatatgaaaaaaaatccacccaatttaaaagtgaagttaaat 
aaagagaattggaataaattgagattcttatgggaaaatttatctaaaagatatatgtta 
gaattcaaaaagatgagtgaagatgatctgtatt tgtttgttgaaagattgttgaacgac 
gatgatttatttgttaaacaacaaccagaaagaatacatcaatctttggaaaaagatgat 
gaaggtaaacaagttatcaaggaatctatttcagaatataattatagaaatgaatttatt 

15 tatatgaactatggtaagtttttaaaacaaattgtacttaaaactaatgtaccaatatct 
ataatgcataaaaatttactttccgttttaaaagataaatataatagtgatgaacgattc 
ttgagtgaattaagtttaaataatataataagagaatttaacaagcgttttgaagaaaaa 
tatagtcaaagttatgaatataagaaattagatttttctgctactacaaccatttatgat 
tcagaaatatcagagtttaaagattgggtagatgcaaattatttaggtactaacgttgaa 

20 aataacattcaaactgaaaaaagatttttatatgaaagaccaccagttagatatgatagt 
gtaacacctgagttagagttgttaaaaagaaattacgataaaaatgtaactgtatttggt 
aatttgcctaaaaaagcgatacaagttcctaaatatactggtggcactactacgcctgat 
tttgtctatatgatagaaactgatgaacaagatgcaaaataccttattgttgaaacaaaa 
gcagaaaacatgagactaggagataaaagtattggtgaaatacaaaaaaaattctttaac 

25 acattagataatttgaatattaaatatcaattagctactagcgcgcaagatgtttataat 
gaaattaaaaaattagatgattcaaagtga 

Sequence 25 94 

MKIPLEELIHQQQAVKKIIDTFTGIEKYLTSKNCDNEFANNLIINRYSEEANIDIKMETG 

30 TGKTYVYTKMMYELHKKFGIFKFILWPSPAIKEGAKNFLTSLSTKRHFQETYGNVEIEI 
NTINKGDFNTRSGRKIFPPHLSNFIESSNLNANQIQVLLINAGMLNSSNMTKVDYDQTLL 
SNYSNPI EALKATKSVVI I DEPHRFPRDKKNYKS lENLEPQMI VRFGAT FPEVKKGTGKK 
AVYIKDYYRGRPQFELNAVDSFNQGLVKGIDIYYPNLTPEQAKNRYTIDSVKAKEIVLKK 
GKNKWTLGIGENLANIDSLFEGDLSYSGAKTLSNDLEISKGMDLLPGTFTTNYQELIIND 

35 AINQHFEHEINNFMRDNLKENFQPECVKTLSLFFIDSIRSYRNKEGWLKQTFERLLKVKLR 
KLIKEFEVKKLPREIEYLDFLRVTYDSLNSENQMVHAGYFGEDRGSGDEAIQAEVNDILK 
NKEKMLSFKDENGNWITRRFLFSKWTLREGWDNPNVFVIAEaRTSGSENSKIQEVGRGLR 
LPVDENGHRLTQDEFPSRLSFLIGYDEKDFAEKLIGEINSDVDVKLSEDKLTDEMINKIV 
EHRKQVDPHYNDEVLLEQLDERNLINRKNEFKTDVEIDGVKKSGFEWLLELYPEVNASKL 

40 NADECVRDMKKNPPNLKVKLNKENWNKLRFLWENLSKRYMLEFKKMSEDDLYLFVERLLND 
DDLFVKQQPERIHQSLEKDDEGKQVIKESISEYNYRNEFIYMNYGKFLKQIVLKTNVPIS 
IMHKNLLSVLKDKYNSDERFLSELSLNNIIREFNKRFEEKYSQSYEYKKLDFSATTTIYD 
SEISEFKDWVDANYLGTNVENNIQTEKRFLYERPPVRYDSVTPELELLKRNYDKNVTVFG 
NLPKKAIQVPKYTGGTTTPDFVYMIETDEQDAKYLIVETKAENMRLGDKSIGEIQKKFFN 

45 TLDNLNIKYQLATSAQDVYNEIKKLDDSK* 

Sequence 2595 

Contig_0777_pos_1393_623, 

is similar to {with p-vaiue O.Oe+00) 

50 >gp:gp|U92974 |LLU92974_13 Lactococcus lactis unknown gene, p 
artial cds, and HisC (hisC) , unknown, HisG (hisG) , unknown, 
HisB (hisB), unknown, HisH (hish) , HisA (hisA) , HisF (hisF) , 

HisIE (hisIE), unknown, unknown, LeuA (leuA) , LeuB (leuB) , 
LeuC (leuC), LeuD (leuD), unknown, IlvD (ilvD), IlvB (ilvB) , 

55 IlvN, IlvC (ilvC), IlvA (ilvA), AldB (aidS) and aldR (aldR) 
genes, complete cds. NID: g2565137. 
atggattatagagtactactttattataaatatgtaactatagatgaccctgaaactttt 
gcagccgaacatttgaaattttgtaaggaacatcatttaaaaggaagaatactagtttca 
acggaaggcattaatggaacattatctggaacaaaagaagatactgataaatatatagag 
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catatgcatgcagatagtcgttttgctgatttaacttttaaaattgatgaagctgaaagt 
catgcgtttaaaaagatgcacgtgcgtccaagacgtgaaattgttgcacttgacttagaa 
gaagatattaatccacgtgaaattaccggtaaatactattctcctaaagaatttaaagcc 
gcactagaagatgaaaatactgttatattagatgctcgaaatgattatgaatacgattta 
5 ggacatttccgtggagctattcgtcctgatataacacgattccgtgacttacctgaatgg 
gtgcgtaataataaagaacaactcgacggaaaaaatattgtcacatattgtacaggtggc 
attcgttgtgaaaaattttctggttggttagtaaaagaaggatttgaaaacgtaggtcag 
ttgcatggtggtattgctacatacggtaaagaccctgaaactaaagggctatattgggat 
ggtaagatgtatgtatttgatgaacgtattagtgtcgatgtgaatcaaattgataaaaca 
10 gtcatcggcaaagagcattttgatggtacaccttgtgaagtttgttgttga 

Sequence 2596 

MDYRVLLYYKYVTIDDPETFAAEHLKFCKEHHLKGRILVSTEGINGTLSGTKEDTDKYIE 
HMHADSRFADLTFKIDEAESHAFKKMHVRPRREIVALDLEEDINPREITGKYYSPKEFKA 
15 ALEDENTVILDARNDYEYDLGHFRGAIRPDITRFRDLPEWVRNNKEQLDGKNIVTYCTGG 
IRCEKFSGWLVKEGFENVGQLHGGIATYGKDPETKGLYWDGKMYVFDERISVDVNQIDKT 
VIGKEHFOGTPCEVCC* 

Sequence 2597 
20 Cont ig_07 7 8_pos_32 4_8 1 5 , 

is similar to (with p-value 5.0e-52) 

>sp:sp|P38424 I YSXC_BACSU HYPOTHETICAL GTP-BINDING PROTEIN IN 

LONA-HEMA INTERGENIC REGION (ORFX) . >pir ; pir I S4 5102 | S45102 
hypothetical protein X - Bacillus subtilis >gp : gp | X76424 | BSL 
25 0NLA_3 B. subtilis Ion gene for protease La. NID: g4 95556. >g 
p:gp| Z99118 I BSUB0015_84 Bacillus subtilis complete genome (s 
ection 15 of 21): from 2795131 to 3013540. NID: g2635200. >g 
p:gp! Z75208 |BSZ75208_85 B. subtilis genomic sequence 89009bp. 
NID: gl769994, 

30 atgaatataaattttaataatattaacttaattataagtgctgtaaaaaaagcacagtat 
cctgacactggattaacagaagtagcgttaagtggacgctcaaatgtagggaaatctaca 
tttattaatagtatgattgggcgtaaaaatatggcgagaacgtcacaacaacctggtaag 
acacagacattgaatttctataatatagatgaacaacttatttttgttgatgtaccagga 
tatggatacgctaaagtaagtaaagttcaacgagaaaaatttggtaaaatgattgaagaa 

35 tatattacacaacgagagaatttaaaacttgttattcaacttgtcgatttaagacatcaa 
cctactgaagatgatgtgcttatgtacaattatcttaaacattttgatataccaacactt 
gtaatatgtactaaagaagataaaattgccaaaggaaaagtacgagggtccagagaggct 
ccgaagatatga 

40 Sequence 2598 

MNINFNNINLIISAVKKAQYPDTGLTEVALSGRSNVGKSTFINSMIGRKNMARTSQQPGK 
TQTLNFYNIDEQLIFVDVPGYGYAKVSKVQREKFGKMIEEYITQRENLKLVIQLVDLRHQ 
PTEDDVLMYNYLKHFDIPTLVICTKEDKIAKGKVRGSREAPKI* 

45 Sequence 2599 

Cont i g_0 7 7 8_pos_2 0 3 7_3 0 4 7 , 

is similar to (with p-value 8.0e-80) 

>sp:sp|P41006|PYRP_BACCL URACIL PERMEASE (URACIL TRANSPORTER 
). >pir :pir IS38893IS38893 uracil transport protein - Bacillu 

50 s caldolyticus >gp: gp 1X76083 1 BCPYRQP_2 B. caldolyticus (DSM40 
5) pyrR, pyrP and pyrB (partial) genes. NID: g431229. 
atgctggttgcattatttatgagtggattaatgtacgtgattataggtattttcattaaa 
ttgagtggaacacattggttaatgcacttgttaccaccagtagttgtcggaccagtaata 
atggtcattgggttaagtttagctcctacagcagtaaacatggccatgttcgaaaattct 

55 gctgaaatgaaagggtataacttaagttacttaattgttgctttgattacattagcagta 
accatcatcgtccaaggattcttcaaaggatttttatcactaatacctgtacttataggt 
attdtagl.gggatatattgtatccattttcatgggcatagttaaatttgctccaat agca 
caagcgaaatggatagattttcctcatatttatctaccatttaaagattacacaccatct 
tttcatttaggactcattctcgtgatgatacccgtggtgtttgtgacggtaagtgaacat 
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attggtcatcaaatggtaattaataaaatagtaggacgcaatttctttgaaaatccaggt 
ttagataaatcaatcattggtgatggtgtttcaactatgtttgcaagtatgataggaggt 
cctcctagtacaacttatggtgaaaatataggtgtactagcgatcaccaaaatatatagt 
atttacgttattggtggtgcggcagttatagctatcattcttgcatttattggtaagttc 
5 actgctttaatatcttcaatacctacgccagtgatgggtggtgtctcaattttattattc 
ggtattatagcagctagtggtttaagaatgcttgttgaaagtcaagtagatttcgcaagc 
aatcgcaacttggttatagcatcagttgtgcttgttgtcgggattggtaatcttcttatc 
aatttaaaaggcataggtatcaatttacaaattgaaggaatggcattatcagcactttca 
ggaataatattaaatttaattttgccaaaagataaaaaccaaataaattaa 

10 

Sequence 2600 

MLVALFMSGLMYVIIGIFIKLSGTHWLMHLLPPVWGPVIMVIGLSLAPTAVNMAMFENS 
AEMKGYNLSYLIVALITLAVTIIVQGFFKGFLSLIPVLIGIIVGYIVSIFMGIVKFAPIA 
QAKWIDFPHIYLPFKDYTPSFHLGLILVMIPVVFVTVSEHIGHQMVINKIVGRNFFENPG 
15 LDKSIIGDGVSTMFASMIGGPPSTTYGENIGVLAITKIYSIYVIGGAAVIAIILAFIGKF 
TALISSIPTPVMGGVSILLFGIIAASGLRMLVBSQVDFASNRNLVIASWLVVGIGNLLI 
NLKGIGINLQIEGMALSALSGIILNLILPKDKNQIN* 

Sequence 2601 
20 Cont i g_07 7 8_pos_3 0 7 3_3 95 4 , 

is similar to (with p-value 2.0e-90) 

>sp:sp|P05654|PYRB_BACSU ASPARTATE CARBAMOYLTRANSFERASE (EC 
2.1.3.2) (ASPARTATE TRANSCARBAMYLASE) (ATCASE) . >pir:pir|A25 
015IOWBSAC aspartate carbamoyltransf erase (EC 2.1.3.2) catal 

25 ytic chain - Bacillus subtilis >gp : gp | M13128 | BACPYRB_1 B.sub 
tills pyrB gene encoding aspartate transcarbamoylase, comple 
te cds. NID; gl43383. >gp : gp I M59757 | BACPYR0P_3 Bacillus subt 
ills pyrimidine biosynthetic (pyr) gene cluster (pyrR, pyrP, 
pyrB, pyrC, pyrAA, pyrAB, pyrD, pyrF and pyrE) genes, «:ompl 

30 ete cds. NID: g387576. >gp: gp | Z99112 I BSUB0009_20 Bacillus su 
btilis complete genome (section 9 of 21) : from 1598421 to 18 
07200. NID: g2633902. 

atggaacacttattatcaatggagcatttatctaattcagaaatttatgatttaattact 
atcgcttgccaattcaaatctggtgagcgaccattacctcaatttaacggtcaatacgta 

35 tcaaacttattcttcgaaaattcaacgcgaacaaagtgtagctttgagatggcagaacaa 
aaattaggattaaaacttattaattttgaaacaagtacatcatctgtaaaaaagggtgag 
tcactttatgacacatgtaaaacacttgaaagtataggtgttgatttacttgtcatacgt 
cactcccaaaattcttattacgaagaactggatcaat taaatattccaattgctaatgca 
ggtgatggaagtggacaacatcctactcagagtttattagacataatgacaatatatgaa 

40 gaatatggttcgtttgaaggtttgaatattctaatatgtggggacattaaaaattctcgt 
gtcgcaagaagtaattatcatagtttaacatcattaggtgccaacgtaatgttctcaagt 
ccaaaagaatgggtagataatacattagaggcgccttatgttgaaattgatgaagtcatt 
gataaagtagatattgttatgttgcttagagttcaacatgaaagacatggaatttcaggt 
gaagctaactttgctgctgaagaatatcatcaacaatttggtttaacacaggctagatat 

45 gataaattaaaagaggaagccattgtaatgcatccagctcctgtaaatagaggtgttgaa 
attaaaagcgagctagttgaagcacctaagtctcgaatatttaagcagatggaaaatgga 
atgtatttaagaatggcagtaataagtgcgcttttacaatag 

Sequence- 2602 

50 MEHLLSMEHLSNSEIYDLITIACQFKSGERPLPQFNGQYVSNLFFENSTRTKCSFEMAEQ 
KLGLKLINFETSTSSVKKGESLYDTCKTLESIGVDLLVIRHSQNSYYEELDQLNIPIANA 
GDGSGQHPTQSLLDIMTIYEEYGSFEGLNILICGDIKNSRVARSNYHSLTSLGANVMFSS 
PKEWVDNTLEAPYVEIDEVIDKVDIVMLLRVQHERHGISGEANFAAEEYHQQFGLTQARY 
DKLKEEAI VMHPAPVNRGVEI KSELVEAPKSRI FKQMENGMYLR^4AVISALLQ* 

55 

Sequence 2603 

Cont ig_07 7 8_pos_3 972_52 4 9 , 

is similar to (with p-value O.Oe+00) 

>sp:sp|P4 6538|PYRC_BACCL DIHYDROOROTASE (EC 3.5.2.3) (DHOASE 
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). >pir:pir|S34319[S34319 dihydroorotase (EC 3.5.2.3) - Baci 
llus caldolyticus >gp : gp I X73308 I BCPYR_2 B.caldolyticus pyrim 
idine biosynthesis genes. NID: g312439. 

atgaaattaattaaaaacggaaaaatcttaaaaaacggtatcctaaaagacacagaaatt 
5 ttaatcgacggtaaacgtattaaacaaattagtagtaaaattaatgcttcatcttcaaat 
attgaagttattgatgcaaaaggaaatttaattgctcccggttttgtagatgttcatgtg 
cacctacgtgaaccaggtggtgaacataaagaaacaattgaaagtggtacaaaagccgct 
gcaagaggtggttttactacagtatgtcctatgcctaatacaagacctgtaccagataca 
gttgaacatgttagagaattaagacaacgaatttctgaaacagcacaagttagggtgttg 

10 ccttatgctgctattactaagagacaagcaggtactgaacttgttgattttgaaaaatta 
gcactagaaggtgtgtttgcatttactgacgatggtgtgggagttcaaacagcaagtatg 
atgtatgctgctatgaagcaagctgcaaaagttaaaaaaccgattgtcgcacactgtgaa 
gataatagcttaatctatggtggtgcaatgcataaaggtaaacgtagtgaagaattaggc 
atacctggtattccaaatattgctgaatctgtacaaattgctagagatgtattattggct 

15 gaagcaactggttgtcactatcatgtgtgtcatgtttcaactaaggaaagtgttcgagta 
atcagagacgctaaaaaagctggtatccatgtaacagcagaagttacaccacatcattta 
ttattaactgaaaatgatgttcctggcgatgattcaaactacaaaatgaatccaccatta 
agaagtaatgaagatagagaagcacttttagaaggcttattagatggaacaattgattgt 
attgcaacggatcatgcacctcacgctaaagaagaaaaagcacaacctatgacaaaagca 

20 cct ttcgqcatcgtaggtagtgaaacagcattcccattactttatacacactttgt.fiaga 
cgaggtaattggtcactgcaacaattagttgattatttcactattaaaccagctactatt 
ttcaacttaaattatggaaaattacacaaagatagttacgctgatttaacaataattgat 
cttaatactgaaaaagaaatcaaaagtgaagatttcttatctaaagctgataacactcca 
tttattggtgaaaaagtttatggaaatccaacactaacaatgcttaaaggtgaagtagta 

25 ttcgaggaggaaaagtag 

Sequence 2604 

MKLIKNGKILKNGILKDTEILIDGKRIKQISSKINASSSNIEVIDAKGNLIAPGFVDVHV 

HLREPGGEHKETIESGTKAAARGGFTTVCPMPNTRPVPDTVEHVRELRQRISETAQVRVL 
30 PYAAITKRQAGTELVDFEKLALEGVFAETDDGVGVQTASMMYAAMKQAAKVKKPIVAHCE 
DNSLIYGGAMHKGKRSEELGIPGIPNIAESVQIARDVLLAEATGCHYHVCHVSTKESVRV 
IRDAKECAGIHVTAEVTPHHLLLTENDVPGDDSNYKMNPPLRSNEDREALLEGLLDGTIDC 
lATDHAPHAKEEKAQPMTKAPFGIVGSETAFPLLYTHFVRRGNWSLQQLVDYFTIKPATI 
FNLNYGKLHKDSYADLTIIDLNTEKEIKSEDFLSKADNTPFIGEKVYGNPTLTMLKGEVV 
35 FEEEK* 

Sequence 2605 

Contig_0778_pos_5250_0, 

is similar to (with p-value 5.0e-51) 

40 >sp:5p|?25993|CARA_BAC5U CARBAMOYL- PHOSPHATE SYNTHASE, I YRIM 
IDINL-SPECIFIC, SMALL CHAIN (EC 6.3.5.5) ( CARBAMOYL -PHOoPHAT 
E SYNTHETASE GLUTAMINE CHAIN). >pir : pir | E39845 ! E39845 carbarn 
oyl-phosphate synthase (glutainine-hydrolyzing) (EC 6.3.5.5), 
pyrimidine-repressible, small chain - Bacillus subtilis >gp 

45 :gp|M59757|BACPYROP_5 Bacillus subtilis pyrimidine biosynthe 
tic (pyr) gene cluster (pyrR, pyrP,pyrB, pyrC, pyrAA, pyrAB, 
pyrD, pyrF and pyrE) genes, complete cds. NID: g387576. >g 
p:gp|Z99112|BSUB0009_22 Bacillus subtilis complete genome (s 
ection 9 of 21): from 1598421 to 1807200. NID: g2633902. 

50 atgcttgaaaaacgttatcttgtactggaagatggctcttattacgaaggatatcgctta 
gggtcagatgacttatctataggcgaaattgtattcaacactgctatgacggggtaccaa 
gaaacaatctctgacccgtcttacacaggtcaaatcataacttttacgtacccactaatt 
ggaaactatggtattaatcgcgatgattttgaatcattaacacctaaattaaatggggta 
gtagtaaaggaagcaagtacacaccctagtaactttagacaccaaaaaactttacacgaa 

55 acacttgctcaatatcatattcctggtatatcgggggtagatactagaagtattactcgt 
aaaattagaaattatggtgttttaagagctggatttacagataataaagataacattcag 
gaacttgttgaacagttgaaaactgctgaattacctagagatgaagttcaaacggtttct 
acaaaaacaccatatgtatcaacaggttccgatttaagcgtcgttttactcgactttggt 
aaaaagc 
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Sequence 2 606 

MLEKRYLVLEDGSYYEGYRLGSDDLSIGEIVFNTAMTGYQETISDPSYTGQIITFTYPLI 
GNYGINRDDFESLTPKLNGVVVKEASTHPSNFRHQKTLHETLAQYHIPGISGVDTRSITR 
5 KIRNYGVLRAGFTDNKDNIQELVEQLKTAELPRDEVQTVSTKTPYVSTGSDLSWLLDFG 
KKX 

Sequence 2607 

Contig_077 9_pos_3369_1672, 

10 is similar to (with p-value O.Oe+00) 

>sp:sp|Pl8255|SYTl_BACSU THREONYL-TRNA SYNTHETASE 1 (EC 6.1. 
1.3) (THREONINE— TRNA LIGASE) (THRRS) . >pir : pir I B37770 I YSBST 
1 threonine — tRNA ligase (EC 6.1.1.3) 1 - Bacillus subtilis 
>gp:gp|AF008220|AF008220_195 Bacillus subtilis rrnB-dnaS gen 

15 omic region- NID: g2293135. >gp : gp | M36594 ( BACTRNASB_1 B.subt 
ilis threonyl-tRNA synthetase (thrSv) gene, complete cds. NI 
D: gl43765, >gp : gp | Z99118 I BSUB0015_160 Bacillus subtilis com 
plete genome (section 15 of 21): from 2795131 to 3013540. NI 
D: g2635200. >gp : gp | Z75208 I BSZ75208_5 B. subtilis genomic seq 

20 uence 89009bp. NID: gl769994. 

atggcacaagcattaaaacgtttatacggagacgttaaatttggagttggacctgtaata 
gaaggcggattctattatgattttgatatggatgataaggtttcatcggatgattttgat 
aaaattgagaaaacaatgaaacaaattgtgaacgaaaatcataaaattgtaagagaagta 
gttagtaaagaaaaagcaaaagacttcttcaaggatgacccttataaattagaacttatt 

25 gatgcaattcctgaagatgagagtgtaacactttatactcaaggtgaatttactgattta 
tgtcgaggtgtacacgtaccttctacttctaaaattaaagagttcaaactattatctaca 
gctggtgcttattggcgtggaaatagtgataataaaatgttacaacgaatttatggtaca 
gcattctttgacaaaaaagatttgaaagcacatctaaaaatgttggaagaacgtcgtgag 
cgtgatcatcgtaaaattggtaaagatttagaattgtttacaaacaatcaactcgttggt 

30 gctggtttaccattatggttaccaaatggtgctacaatacgtagggaaatagaacgttat 
attgtcgataaagaagtaagtatgggatacgatcatgtttacacaccagtattagccaat 
gtLgatt.t,atataaaacatctggtcactgggatcattatcaagaagatatgttcc'.:agca 
atgaagttagatgaagacgaagcaatggtcttaagaccaatgaactgtccacatcatatg 
atgatttataaaaacaaacctcattcttatcgcgaattacctatacgtattgctgaattg 

35 ggtactatgcatcgttacgaagcaagtggtgcagtatcaggtttacaacgtgttcgagga 
atgacattgaatgattcccatattttcgttagacctgatcaaattaaagaagaatttaaa 
cgtgtagttaatatgattcaagatgtgtacaaagattttggttttgaagattatcgcttc 
agattgagttatagagatcctgaagataagcataagtactttgatgatgatgaaatgtgg 
gaaaaagctgaatccatgcttaaagaagcatcagatgaattaggtttaacttatgaagaa 

40 gctattggtgaggcagcattctatggacctaagttagatgttcaagtaaaaacagctatg 
ggaaaagaagaaactctatcaacagcacaacttgattttcttttaccagaacgttttgac 
ttaacgtacattggtcaagatggagaacaacatcgtcctgtagttatacaccgtggtgta 
gtttctactatggaacgttttgttgcatttttaacagaagaaacaaaaggtgcatttcca 
acttggttggcgcctatgcaagttgaaattattcctgtaaatatagatttacattatgat 

45 tatgcaagacttttacaagatgaactaaaatcgcaaggtgtccgcgttgaaattgatgac 
cgtaatgaaaaaatgggatataaaattcgtgaagctcaaatgaaaaaaataccttatcag 
attgttgtaggtgaccaagaagtagagaatcaagaagtaaatgtaagaaaatatggttct 
gaaaaacaagaatcagttgaaaaagatgaatttatttggaatgttattgatgaaatccgt 
ttgaaaaagcatagataa 

50 

Sequence 2608 

MAQALKPUYGDVKFGVGPVIEGGFYYDFDMDDKVSSDDFDKIEKTMKQIVNENHKTVREV 
VSKEKAKDFFKDDPYKLELIDAIPEDESVTLYTQGEFTDLCRGVHVPSTSKIKEFKLLST 
AGAYWRGNSDNKMLQRIYGTAFFDKKDLKAHLKMLEERRERDHRKIGKDLELFTNNQLVG 
55 AGLPLWLPNGATIRREIERYIVDKEVSMGYDHVYTPVLANVDLYKTSGHWDHYQEDMFPA 
MKLDEDEAMVLRPMNCPHHMMIYKNKPHSYRELPIRIAELGTMHRYEASGAVSGLQRVRG 
MTLNDSHIFVRPDQIKEEFKRVVNMIQDVYKDFGFEDYRFRLSYRDPEDKHKYFDDDEMW 
EKAESMLKEASDELGLTYEEAIGEAAFYGPKLDVQVKTAMGKEETLSTAQLDFLLPERFD 
LTYIGQDGEQHRPWIHRGWSTMERFVAFLTEETKGAFPTWLAPMQVEIIPVNIDLHYD 
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YARLLQDELKSQGVRVEI DDRNEKMGYKI REAQMKKI PYQI WGDQEVENQEVNVRKYGS 
EKQESVEKDEFIWNVIDEIRLKKHR* 

Seque-nce 2609 
5 Cont ig_07 8 l_pos_5 1 54_4 7 3 2 , 

is similar to (with p-value 6.0e-49) 

>gp:gp| D88209I D88209_l Bacillus licheniformis DNA for Pz-pep 

tidase, complete cds . NID: gl651215, 

atgtttgctgaatttgaacataaaatacatcaaatagaagaagctggggagccgttaacg 
10 ccaaatcgtatgaatgaagaatatgctaaactgaacaaactatattttggtgaagcagta 
gaaactgacgatgatattagtaaagaatggtcacgtattcctcatttctatatgaattat 
tatgtatatcaatacgcaactggttatagtgcagctcaaagtttaagtcatcaaatttta 
actgagggtcaacctgctgttgaacgatatatcaatgaattcttaaaaaagggtagctca 
aactatccgattgaaattttaaaaaatgcaggtgttgacatgacaacacctcaaccaata 
15 gaggaagcttgtgaagtattcgaacaaaaattagatgcttttgaaaagttaatgaaagct 
tag 

Sequence 2610 

MFAEFEHKIHQIEEAGEPLTPNRMNEEYAKLNKLYFGEAVETDDDISKEWSRIPHFYMNY 
20 YVYQYATGYSAAQSLSHQILTEGQPAVERYINEFLKKGSSNYPIEILKNAGVDMTTPQPI 
EEACEVFEQKLDAFEKLMKA* 

Sequence 2611 

Contig_0781_pos_3771_324 7, 

25 putative peptide of unknown function 

atgtatgttggtacgagtaaaagtagtgttttaaatctaactgcagttcttaaaggtaaa 
gatatttatcatgtatatgctgaatataagtctccatattataagcaatatggtaaatca 
gaagcccctacaatatatgatgataatattacaagtcaatcagagttaaagaagaaatta 
aaagaaacacttgatgacataccaacaatcgaagtagcaacgaattatttaggattagaa 

30 agtattcatgaaaataatactattcgatttatacacaaacctatcggatttaatactgat 
ttaaaagttgtcaaacttactgaatatcacccccttgtttcgcagcctattgaagtggaa 
ttcagtaatgcacagaaagatattataaaaatgcaatcacagttcaatcgtaggttaaga 
aaggttaataatcttatgaaaaaaggattcaaaactagtgactattctttaaatgtgtta 
gaggaatataacgaaacagtaggaagtgtattgattgatgagtaa 

35 

Sequence 2612 

MYVGTSKSSVLNLTAVLKGKDIYHVYAEYKSPYYKQYGKSEAPTIYDDNITSQSELKKKL 
KETLDDIPTIEVATNYLGLESIHENNTIRFIHKPIGFNTDLKVVKLTEYHPLVSQPIEVE 
FSNAQKDIIKMQSQFNRRLRKVNNLMKKGFKTSDYSLNVLEEYNETVGSVLIDE* 

40 

Sequence 2613 

Contig_0781_pos_3065_1200, 

putatlv^:^ peptide of unknown function 

atgttattaactttagactttcctattcaaataggacacacatttagaaccaagatigata 
45 aataattttagaacaatacttaattattataatgaattagatcatcagcatcgcgcacac 
acagaaactaagcatcatgcacatcaagccatgcaggttgattatagaaatacaaacgtt 
tctgcatttttagattatcttaacggtaatattaatgggcttgttttaggagcaaatgga 
gacggtatagctgaaacaaaacaagccagagtatcaatagatggtaccgtacatcccttg 
ttgcaagaaaggctgcttcatgactttttaggaattaacagaaaattagataaagaaata 
50 cattctaatggtgcagttgattttatttggaatcctccatatataccaggaaatagattg 
ggagaaaatgggacaccaaataattgggaaccagaagcccatattgaagcgtttttaaac 
cctttagttgataatcaatacgttacaaaagaagttataggagaagatacatcaggaaaa 
tataatgtgtacaaatttacgtttgaaccacaaaattacaataaaacgttact tat tact 
tcatgtatacacggtaatgaaactactggattttttgatatgtgccatatactcaatcta 
55 ttggtcaatcaatgggaaaagtatcctcaattaacttacttaagaaaaaatgtacgttta 
atttatgttcctatggttaacccgtggggattcgcaaatcaagaaagagagaatgtgaac 
aatgtagatttaaacagaaattttgattataactggaaggcaggtaaagggacagatcct 
gataaatctaacttcaaaggtaaaagtcctttttctgaaaaagaatcacaaaatatgcgt 
agcttagttcaaagtatagataatttaactgctcacttagatttgcatgatattatttca 
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gtaaataatgattactgtttattttatccgcgttgggccaatcaaaaaaataataatatg 
actcatcttattaacaatttaaaaagtaacggagacctcgttgtttggggttccagtaca 
ttatcatcttttagtaattgggtaggaatccgaaataaaacaacgtcatatctttcagaa 
ataaatgaaaaacgtgtcggtgaaaagaaaagtcccgaagaaatgagacgttcagtacgc 
5 tgggtaggtaatgtaatttttagaatggcacaatttgaatcttatcaaaatggtcaaaca 
tcattagatcctttcattaaagtgatggtatatgatgatagatttaacaataaaacatct 
gaagtcattaccctacgtgcagaaaggaatgaatggcaacgtataatgatgagtcagcag 
cgtttcaaagttttagcaaatggatttgtagagctctatggatatgtgactataaacgtt 
gatagagatgtcacagtggggattaatcctaatattgttcagaattatcatccattcttt 

10 ggatttaataaaagtagaaaacgtaatttattttcaattgaacatagactcaacaaagga 
aatacaactttccctatttacgctgctgctggagttcaaatgtcgacgattactgaacca 
ggtacaaaacgtactgatacagtaatgccggtactagatgttaagaaaaaaggtgctggt 
attgtaacaatcaaacaaattaaattatttgcgaagttcactcctacgcattctgctaat 
tccattcagatattaaaatctggagaatattatttttatttttttctaacgctttgcgat 

15 atcg. tr-.-.tcattatcccacatttcttgaatttgtggttctttattcggtaagccjcctc 
tcatag 

Sequence 2614 

MLLTLDFPIQIGHTFRTKMINNFRTILNYYNELDHQHRAHTETKHHAHQAMQVDYRNTNV 
20 SAFLDYLNGNINGLVLGANGDGIAETKQARVSIDGTVHPLLQERLLHDFLGINRKLDKEI 
HSNGAVDFIWNPPYIPGNRLGENGTPNNWEPEAHIEAFLNPLVDNQYVTKEVIGEDTSGK 
YNVYKFTFEPQNYNKTLLITSCIHGNETTGFFDMCHILNLLVNQWEKYPQLTYLRKNVRL 
I Y VPMVN PWG FANQEREN VNNVDLNRN FDYNWKAGKGT DPDKSN FKGKS P FS EKESQNMR 
SLVQSIDNLTAHLDLHDIISVNNDYCLFYPRWANQKNNNMTHLINNLKSNGDLVVWGSST 
25 LSSFSNWVGIRNKTTSYLSEINEKRVGEKKSPEEMRRSVRWVGNVIFRiyiAQFESYQNGQT 
SLDPFIKVMVYDDRFNNKTSEVITLRAERNEWQRIMMSQQRFKVLANGFVELYGYVTINV 
DRDVTVGINPNIVQNYHPFFGFNKSRKRNLFSIEHRLNKGNTTFPIYAAAGVQMSTITEP 
GTKRTDTVMPVLDVKKKGAGIVTIKQIKLFAKFTPTHSANSIQILKSGEYYFYFFLTLCD 
lAHHYPTFLEFWLYSVSHLS* 

30 

Sequence 2 615 

Contig_078 4_pos_1894_2202, 

putative peptide of unknown function 

gtqactaaagcttctaaagattttgtagtaaacattaacatatgtcctccgaatagcitct 
35 tggacagcatttgtaatcagtactccaggtacaatcggcatgaccgctgcaatgataatt 
gtagctaaatctccgctaggaacaaatgcatgtccaattacagaaataatacctattacc 
aaagaacctatgaattctggaataaattgtgcgtgtagctttcgatctaatatttctact 
actaagtatccaatcgttccagctaatacagctgtgatgatatcaaccagacgacctccc 
tgtagatag 

40 

Sequence 2616 

VTKASKDFWNINICPPNRSWIAFVISTPGTIGMTAAMIIVAKSPLGTNACPITEIIPIT 
KEPMNSGINCACSFRSNISTTKYPIVPANTAVMISTRRPPCR* 



45 Sequence 2617 

Contig_0784_pos_4022_5011, 

is similar to (with p-value O.Oe+OO) 

>pir:pirlS39743|S39743 hypothetical protein - Bacillus subti 
lis 

50 atggctgatttattatctgtattacaagacaaattatccgggaaaaatgtaaaaatagta 
ttacctgaaggtgaagatgaacgagtgctcattgctgcgactcagctacaaaaaactgac 
tatgtttcacctatcgttctaggaaacgaagataatattaaatctcttgcttctaaacac 
gctttagatttaactcaaattgaaatcatagatccagcaacgagtgaacttaaagatgag 
cttgtagatgcttttgttgaaagacgtaaaggtaaggcaactaaagaacaagcagttgaa 

55 ttattagataatgtaaattatttcggaacaatgcttgtgtatactggaaaggctgjiaggt 
ttagtgagtggtgctgcacattctactggagatacagtcagaccagcattacaaattatc 
aaaactaaacctggtgtatctagaacatctggtattttctttatgattaaaggcgacgaa 
caatatatttttggagattgtgcgattaatccagaattagatgctcaaggacttgctgaa 
attgcagtagagagtgctaaatcagcacaaagctttggaatggaccctaaagtagctatg 
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ttaagcttttctacaaaaggttctgctaaatcggatgatgttactaaagtgcaagaagca 
ttgaagttagctcaagaaaaagctgaagcagatcaattagatcatgtagttattgatgga 
gaattccaatttgacgctgctattgttcctagcgtagcagagaagaaagcacctggtgca 
aaaattcaaggtgatgcaaatgtattggttttccctagtctagaagcaggtaatattggt 
5 tataagattgctcaacgtttaggtggatacgatgcagtaggaccagtcctacaaggatta 
aactctccagtcaatgatttatctcgtggttgctcaactgaagacgtttataacttatct 
attattacagctgctcaagctttacaataa 

Sequence 2618 

10 MADLLSVLQDKLSGKNVKIVLPEGEDERVLIAATQLQKTDYVSPIVLGNEDNIKSLASKH 
ALDLTQIEIIDPATSELKDELVDAFVERRKGKATKEQAVELLDNVNYFGTMLVYTGKAEG 
LVSGAAHSTGDTVRPALQIIKTKPGVSRTSGIFFMIKGDEQYIFGDCAINPELDAQGLAE 
lAVESAKSAQSFGMDPKVAMLSFSTKGSAKSDDVTKVQEALKLAQEKAEADQLDHVVIDG 
EFQFDAAIVPSVAEKKAPGAKIQGDANVLVFPSLEAGNIGYKIAQRLGGYDAVGPVLQGL 

15 NSPVNDLSRGCSTEDVYNLSIITAAQALQ* 

Sequence 2619 

Contig_0784_pos_5083_5850, 

is similar to {with p-value 5.0e-66) 

20 >sp:sp|P3964 8 t YWFL_BACSU HYPOTHETICAL 31.4 KD PROTEIN IN PTA 
3'REGION. >pir:pir|S39745|S39745 hypothetical protein - Bac 
illus subtilis >gp: gp I X7 3124 | BSGENR_91 B.subtilis genomic re 
gion (325 to 333). NID: g413923. >gp: gp | Z99123 | BSUB0020_60 B 
acillus subtilis complete genome (section 20 of 21) : from 37 

25 98401 to 4010550. NID: g2636240. 

atciC'-.atcttttgcgtttgatgacactttttccgaaagcgttggtaaagatttatcttgt 
aatgtagtacgaacgtggatacatcaacacaccgtgattttgggcattcatgattcgcgt 
ttaccatttttaagtgatggtattcgttttcttacagatgaacaaggatataatgcaatt 
gttaggaattctggtggcttgggtgtcgtattagatcaaggaattttaaacatatctttg 

30 a.tttttaaaggacaaaccgaaacgactattgatgaagcctttacagtgatgtatttattg 
attaataaaatgtttgaggatgaagatgttagtatcgatactaaagaaattgagcaatcg 
tattgcccaggaaaatttgatttaagtattaatgataagaaatttgccgggatttcgcag 
cgacgagtacgtggtggtatcgcagtgcaaatatacttatgtattgaaggttctggctca 
gaacgggcattaatgatgcaacagttttatcaacgtgcgcttaaaggggagactactaaa 

35 tttcactatccagacatagatccctcatgtatggcatctttagaaacccttttaaataga 
gaaattaaagtgcaagatgttatgtttttattattatatgcactaaaagatttaggggca 
aacttaaatatggatcctattacagaagacgagtggacacgttacgaagggtattatgat 
aagatgttagaacgcaatgcgaaaatgaatgaaaaattagatttttag 

40 Sequence 2620 

MQSFAFDDTFSESVGKDLSCNVVRTWIHQHTVILGIHDSRLPFLSDGIRFLTDEQGYNAI 
VRNSGGLGVVLDQGILNISLIFKGQTETTIDEAFTVMYLLINKMFEDEDVSIDTKEIEQS 
YCPGKFDLSINDKKFAGISQRRVRGGIAVQIYLCIEGSGSERALMMQQFYQRALKGETTK 
FHYPDIDPSCMASLETLLNREIKVQDVMFLLLYALKDLGANLNMDPITEDEWTRYEGYYD 

45 KMLERNAKMNEKLDF* 

Sequence 2621 

Con t i g_0 784 _pos_3 8 90_3 102, 

is similar to (with p-value 9.0e-94) 

50 >sp:sp!P3964 5|YWFI_BACSU HYPOTHETICAL 29.5 KD PROTEIN IN ROC 
C-PTA INTERGENIC REGION. >pir :pir I S39742 | S39742 hypothetical 
protein - Bacillus subtilis >gp: gp I X73124 | BSGENR_88 B.subti 
lis genomic region (325 to 333). NID: g413923. >gp: gp| Z99123 
|BSUB0020_63 Bacillus subtilis complete genome (section 20 o 

55 f 21): from 3798401 to 4010550. NID: g2636240. 

atgaatccaataaaaattttgaaaaaggagcgaataataatgagtgaagcagcagaaact 
ttagatggttggtatagcttacatttattttatgcagtagactggacaacttttcgttta 
attgctgaagatgatcgtgaagcaatgattactgaattggaaacatttattaaagataaa 
acagttgctagagaatcacatcaaggtgatcatgcaatttataacattacaggtcaaaaa 
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gcggaccttttactatggtttttacgtccagaaatgaaagagttaaatcaaattgaaaat 
gagtttaataaattacgtatcgcagactatctcattccaacttattcctatgtgtcagtg 
atagaattaagtaattatttagcaggcaaatctgatgaggatccttatgaaaatccccac 
gttaaggcacgattataccctgaattaccacattctgaatatatatgtttctatccaatg 
5 gataaacgacgcaatgaaacttataactggtatatgttacctatcgaagaccgtaaaact 
ttaatgtataaccatgggatgataggtcgtaaatatgctggtaaaatcaaacagtttatt 
acaggttcagtaggttttgatgactatgagtggggtgttacattattttcaaatgatgta 
cttcaat tcaaaaaaattgtctatgaaatgcgttttgatgaaacgactgctcgttatggc 
gaatttggtagtttctatattggtcacattctaaacatcgaagacttcaaacaatttttt 
10 agtatataa 

Sequence 2622 

MNPIKILKKERIIMSEAAETLDGWYSLHLFYAVDWTTFRLIAEDDREAMITELETFIKDK 
TVARESHQGDHAIYNITGQKADLLLWFLRPEMKELNQIENEFNKLRIADYLIPTYSYVSV 
15 lELSNYLAGKSDEDPYENPHVKARLYPELPHSEYICFYPMDKRRNETYNWYMLPIEDRKT 
LMYNHGMIGRKYAGKIKQFITGSVGFDDYEWGVTLFSNDVLQFKKIVYEMRFDETTARYG 
EFGSFYIGHILNIEDFKQFFSI* 

Sequence 2623 
20 Contig_078 4_pos_24 95__184 5, 

putative peptide of unknown function 

atggcacgtattgctacaaaattgggctatcctgaaagtaatagtttcgtgactaatact 
gtaattgaatttgttttacataatgaagcatatcctcggttgtatagaattaaaactcga 
gatacgaacttaataaaaatttctcaagctaatgaaatctcacgtcaaattacaaatggc 

25 acaatgacgcttgaagaagctaagtatcaattagaggaaatatatgttgctaaaagagat 
agcagtctcccttttaaaggaattgccgcagcaattatcgctacgagcttcctctatcta 
cagggaggtcgtctggttgatatcatcacagctgtattagctggaacgattggatactta 
gtagtagaaatattagatcgaaagctacacgcacaatttattccagaattcataggttct 
ttggtaataggtattatttctgtaattggacatgcatttgttcctagcggagatttagct 

30 acaattatcattgcagcggtcatgccgattgtacctggagtactgattacaaatgctatc 
caagatctattcggaggacatatgttaatgtttactacaaaatctttagaagctttagtc 
accgcctttggtataggcgctggtgtaagttcaatattaattttagtctag 

Sequence 2624 

35 MARIATKLGYPESNSFVTNTVIEFVLHNEAYPRLYRIKTRDTNLIKISQANEISRQITNG 
TMTLEEAKYQLEEIYVAKRDSSLPFKGIAAAIIATSFLYLQGGRLVDIITAVLAGTIGYL 
VVriLDRKLHAQFIPEFIGSLVIGIISVIGHAFVPSGDLATIIIAAVMPIVPGVLITNAI 
QDLFGGHMLMFTTKSLEALVTAFGIGAGVSSILILV* 

40 Sequence 2625 

Contig_0786_pos_2219_1125, 

is similar to (with p-value l.Oe-53) 

>sp:sp|P39604 I YWCF_BACSU HYPOTHETICAL 4 3.3 KD PROTEIN IN QOX 
D'VPR INTERGENIC REGION. >pir : pir I S39697 | S39697 hypothetical 

45 protein - Bacillus subtilis >gp: gp I X7 3124 I BSGENR_4 3 B.subti 
lis genomic region (325 to 333). NID: g413923. >gp: gp 1 Z99123 
|BSUB0020_107 Bacillus subtilis complete genome (section 20 
of 21): from 3798401 to 4010550. NID: g2636240. 
atgggtggcggacaatatagtgcaaatttcagtattagacaaattatttattatattttc 

50 ggagctattattgcatttttaattatgataatttcaccgaaaaaaattaaaaataatact 
tatattttatacagtatattttgcgttctattaatagggttacttattttacctgaaact 
tcaatcaccccaattattaatggtgctaaaagttggtacagtttcggtcctataagcatc 
caaccttccgaattcatgaaaattatacttatacttgctttggctaaaacgatatctaaa 
cataaccaatttacatttaataagtcttttcagtctgatttaatgttattttttaaaatt 

55 ttaggtgtatccattatacctatggcattaattctattgcaaaatgacctaggtactact 
ttggtgttatgtgcaattatagctggcgtcatgttagtaagtggaataacatggaggata 
tti'vgcccctctttttattgttgcatttgtaagtggttctagtattatattagctal. -att 
tataaaccatccttaatagaaaacctattaggaataaaaatgtatcaaatgggacgtatc 
aattcttggttagatccctattcatacagtagtggagatggatatcacttaacagaatct 
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ttaaaagctattggttcaggtcaattattaggtaaaggttataaccatggcgaagtttat 
attcctgagaatcataccgactttattttttcagtgattggagaagaaatgggctttata 
ggttcagtattattgatattacttttcttattcttaatatttcaccttatacggttagct 
agtaaaattgatagtcagtttaacaaagtatttatcataggatatgtatcgttgattgtt 
5 tttcacgtgttacaaaatatcggcatgacggttcaattattaccgattacaggtatacca 
cttccgtttattagttacggtggaagttctttatggagtttaatgactggtataggagta 
gttctttcaatttattatcatgaaccccaaagatatgaaataaccacattatctaaaaaa 
tctaatacaatttaa 

10 Sequence 2 626 

MGGGQYSANFSIRQIIYYIFGAIIAFLIMIISPKKIKNNTYILYSIFCVLLIGLLILPET 
SITPIINGAKSWYSFGPISIQPSEFMKIILILALAKTISKHNQFTFNKSFQSDLMLFFKI 
LGVSIIPMALILLQNDLGTTLVLCAIIAGVMLVSGITWRILAPLFIVAFVSGSSIILAII 
YKPSLIENLLGIKMYQMGRINSWLDPYSYSSGDGYHLTESLKAIGSGQLLGKGYNHGEVY 

15 IPENHTDFIFSVIGEEMGFIGSVLLILLFLFLIFHLIRLASKIDSQFNKVFIIGYVSLIV 
FHVLQNIGMTVQLLPITGIPLPFISYGGSSLWSLMTGIGVVLSIYYHEPQRYEITTLSKK 
SNTI* 

Sequence 2627 
20 Cont i g_0 7 8 7_pos_l 8 1 7_2 143, 

putative peptide of unknown function 

atgattttctttatcattaactttattcttatttcatatttgtacgataaacaatatgta 
ccttttcaagcaattaccggtataagcttgttttattgcttagttatatttccaataaca 
ctcattttatatgtgcgtattgccaaaaaaaattatctatacagtaataagtatgaaatg 
25 agaactggaataatcattggtattattgctttaattctagtaattatgcaagggtttcac 
tttaactgggctatagatttttttaataatgttgtatggtggtcatttaaaagtaccatc 
tcaattgatagacgaatcgcaacatag 

Seqa.^-icr: 2628 

30 MIFFxINFILISYLYDKQYVPFQAITGISLFYCLVIFPITLILYVRIAKKNYLYSWKYEM 
RTGIIIGIIALILVIMQGFHFNWAIDFFNNVVWWSFKSTISIDRRIAT* 

Sequence 2 629 

Cont ig_07 8 7_pos_63 94_6038 , 

35 putative peptide of unknown function 

atgccagaaaatattgtaagaacaaaagggattgtatggttagcgcagtataatgatgta 
gcgtgtttgttatcacaggctggttcatcttgtaatattcaccccgttacat actgggtg 
gcaacaatgagtgaaagtcaacagcaagctattttggaggcgcgtcaagatgtagtagaa 
gattgggatatcgaatatggagatcgtcaaacgcaatttgtaattattggtacggat tta 

40 ga tcaagaaaaaa t tt cccgggaat t aga tgcat get taattcatagtagt gaga tt gat 
gaagattggcgattactagatagtccgtatcaatggacttatgatcgacgaatgtaa 

Sequence 2630 

MPENIVRTKGIVWLAQYNDVACLLSQAGSSCNIHPVTYWVATMSESQQQAILEARQDWE 
45 DWDIEYGDRQTQFVIIGTDLDQEKISRELDACLIHSSEIDEDWRLLDSPYQWTYDRRM* 

Sequence 2631 

Contig_07B7_pos_4 438_2969, 
is similar to (with p- value O.Oe+OO) 
50 >sp:sp| P39755|NDHF_BACSU NADH DEHYDROGENASE SUBUNIT 5 (EC 1. 
6.5.3) (NADH-UBIQUINONE OXIDOREDUCTASE CHAIN 5). >gp:gp|CJ283 
23|BSU28323_1 Bacillus subtilis NADH dehydrogenase subunit 5 
(ndhF) gene, complete cds . NID: g903586. >gp: gp I Z99104 I BSUB 
0001_183 Bacillus subtilis complete genome (section 1 of 21) 
55 : from 1 to 213080. NID: g2632267. >gp:gp| Z99105 I BSUB0002_11 
Bacillus subtilis complete genome (section 2 of 21) : from 1 
94651 to 415810. NID: g2632457. >gp : gp | AB006424 | AB006424_9 B 
acillus subtilis genomic DNA, 70 kb region between 17 and 23 
degree. NID: g3599592. 
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atgtttt*.tattacacttgtcattgctatactgagtggattaatatttttgaatc.*tcga 
gttcccattcaatatattaaatttcatatatatttacttgttttacctatcattacggga 
ttaagtggattaatatttttcggtgaaagggcgaatgttggaccttttgtagttgatcat 
cttacttggttaatgatgacatttattttgactttaggctttatcattcaaaagttttct 
5 atgcgttatttaattggcgacatgcattaccgtaaatattttccgttttttacattaatt 
actgcatttgcttcattggcatggttaagtggagacttaaggttaatgaccatgttttgg 
ggtgcaacattatttgtgttaacacggctcattaaagttaacaaattatggaaggtgcct 
agggaagcagcaagaatttcagcttggtcatttatattggcatggttgtcgttattgatt 
gctgtcattttattgtatatcgctacaggagattggtatatttattcgaatatgtcagat 

10 gat aatgcaatcaattatggaatgcgtctctgtatcaatt tact tattgttttagctgtg 
attattccggcggcacaatttccatttcaaggctggcttattgaatctgtagctgcgcct 
acgccagtttcagctattatgcacgctggtattgttaatgctggtggcgttattcttaca 
cgcttttctccggtatttaatgacgaaatagccatttcactgttattaattattgcaagt 
atttcagtattgttgggttctggaatcagtcttgtgcatgttgattacaagagacaactt 

15 gtaggttctacgataagtcaaatgggttttatgttagtacaatgtgcgcttggggcatat 
tctgcggcgatagtacatttaatattgcatggtgtgtttaaagcgacattatttctacaa 
tcgggttctgttgttaaaagatttaacattcctacgcctccatctgttaaaaaatcatat 
ggctggcttgtatttggtcgtctactagctattcttatagcgataatattttggttgaat 
agtgatagacatgcatatgatgtattaagcgctcttatattagcttggtcgttaatggtg 

20 tcetqgaatcaattagttgcttttagtcatggactcatcggaagagtcatcggagtatgt 
atgattattgttgtagtaattgtttatattattacgcatcattatttcttcacgacatta 
agtaacgttgatattcatattgtttcaccaccactcataagtattattctatcgattgct 
attatagttttcggcagtatgttaagcatatgggtatcacggcgaagagaatcaaaggca 
tttgcgaagttatacttgtggcttattaa'agtaggagaggctaaaacccaatctatagaa 

25 agtcatccatcatatttaaaacgattttag 

Sequence 2632 

MFFITLVIAILSGLIFLNHRVPIQYIKFHIYLLVLPIITGLSGLIFFGERANVGPFVVDH 
LTWLMMTFILTLGFIIQKFSMRYLIGDMHYRKYFPFFTLITAFASLAWLSGDLRLMTMFW 
30 GATLFVLTRLIKVNKLWKVPREAARISAWSFILAWLSLLIAVILLYIATGDWYIYSNMSD 

DNAINYGMRLCIMLLIVLAVIIPAAQFPFQGWLIESVAAPTPVSAIMHAGIVNAGGVILT 
RFSPVFNDEIAISLLLIIASISVLLGSGI5LVHVDYKRQLVGSTISQMGFMLVQCALGAY 
SAAIVHLILHGVFKATLFLQSGSVVKRFNIPTPPSVKKSYGWLVFGRLLAILIAIIFWLN 
SDRHAYDVLSALILAWSLMVSWNQLVAFSHGLIGRVIGVCMIIVVVIVYIITHHYFFTTL 
35 SNVDIHIVSPPLISIILSIAIIVFGSMLSIWVSRRRESKAFAKLYLWLIKVGEAKTQSIE 
SHPSYLKRF* 

Sequence 2633 

Con t ig_0 7 8 7_pos_2 9 5 1 _2 0 5 8 , 

40 pui.atlvo peptide of unknown function 

atgcttcaatcagatatcaatgaattagtcaatcaggctaaacgtgtaattacaccttta 
tcaccgatttcaacatttgctgcccgtaatccgtgggaggggctagaagatgcttcgttt 
gatcaagtggcacgttggttaaaaagtgtgagggatgttgacatttatcctaatgcgtct 
actattcacagagcgattagtaataaagaaatagatttaaaagtatttgaagaacggttg 

45 gatgaaaatcgtgcgcattataataataggtcactatctgacagtgatatcaacacatat 
attcaaagagcgaaaaatttaaaaacgattgaagaaggttactttaatacaaaagataac 
gagaaactggaaaaatgggtacaaactaattttaaggattataagaaaaaagaagatgtg 
atagcgcaaagtgctagtgttttcacaaaggaaggtacacgacttattgatattttaaat 
gctcatatgattaagtggtctaaattatatgttgatgactttcaatcaagttggactatg 

50 ccaaaaagagaaaaaggattctatcatgcctggcaacgtttagttaaacatgatccatta 
ttcacaaaaaaacaacgacttactttagcacatttgccaaatcaagcaaccgaagcaata 
gagtacgcctttcaagaattaggagtaaaagaagaacatcgacaatcatatattgagagt 
catttattatctttaccaggttgggcaggaatcatgtatcatcggtcacagacacaaagt 
aat gat gcgtact tat taacagactatgttgcgattcgtctatcaattgagatgg tact t 

55 ttaaatgaccaccatacaacattattaaaaaaatctatagcccagttaaagtga 

Sequence 2634 

MLQSDINELVNQAKRVITPLSPISTFAARNPWEGLEDASFDQVARWLKSVRDVDIYPNAS 
TIHRAISNKEIDLKVFEERLDENRAHYNNRSLSDSDINTYIQRAKNLKTIEEGYFN?KDN 
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EKLEKWVQTNFKDYKKKEDVIAQSASVFTKEGTRLIDILNAHMIECWSKLYVDDFQSSWTM 
PKREKGFYHAWQRLVKHDPLFTKKQRLTLAHLPNQATEAIEYAFQELGVKEEHRQSYIES 
HLLSLPGWAGIMYHRSQTQSNDAYLLTDYVAIRLSIEMVLLNDHHTTLLKKSIAQLK* 

5 Sequence 2635 

Con t ig_0 7 8 ljpos_0_Q 1 6 , 

is similar to (with p-value O.Oe+00) 

>pir :pir 1 A40585 IA40585 DNA topoisomerase (ATP-hydrolyzing) { 
EC 5.99.1.3) chain B - Staphylococcus aureus >gp:gp | X71437 |S 

10 AGYRREC_2 S. aureus genes gyrB, gyrA and recF (partial). NID: 
g296393. >gp : gp | 89 I STAGYRABA_1 Staphylococcus aureus ge 
ners for DNA gyrase A and B, complete cds . NID: g54O540. 
gtgatggtgaatacattgtcagatgtaaacaacacagataattatggtgctggacagata 
caagttttagaaggtctcgaagcggttcgtaaaagaccgggtatgtatattggttcaact 

15 tcagaaagagggttgcaccatttagtatgggaaattgttgataatagtattgacgaggca 
ttagcaggttatgctagtcatattgaagttgtaattgagaaagacaattggattaaagtt 
actgacaatggccgtggtattcctgttgatattcaagaaaagatgggacgccctgctgtc 
gaagttatcttaactgtacttcacgctggaggtaaattcggaggtggcggatacaaagta 
tctggcggtcttcacggtgttggatcttcagttgttaatgcactctcacaagatcttgaa 

20 , gtttatgtacatcgtaatggcacgatttatcatcaagcctataaacaaggtgtgccacaa 
tttgatcttaaagaaattggcgatacagataaaacaggtacagctattcgattcaaagcc 
gataaagaaatctttacagagacaacagtttataactatgaaacacttcaaaagcgtata 
cgtgagcttgctttcttaaataaaggtattcaaattactttaaaagatgaaagagaagag 
gaagttagagaagactcatatcattatgaaggcgggattaaatcctatgtagatttatta 

25 aatgagaataaagaacctcttcacgatgaacctatatatatccatcagtctaaagacgat 
attgaagtggaaattgcacttcaatataacagtggatatgcaaccaacttattaacgtat 
gcgaataatattcatacatacgagggtggtaTTCAA 

Sequence 2636 

30 VMVNTLSDVNNTDNYGAGQIQVLEGLEAVRKRPGMYIGSTSERGLHHLVWEIVDNSIDEA 

LAGYASHIEVVIEKDNWIKVTDNGRGIPVDIQEKMGRPAVEVILTVLHAGGKFGGGGYKV 
SGGLi'GVfjSSVVNALSQDLEVYVHRNGTIYHQAYKQGVPQFDLKBIGDTDKTGTAVRFKA 
DKEIFTETTVYNYETLQKRIRELAFLNKGIQITLKDEREEEVREDSYHYEGGIKSYVDLL 
NENKEPLHDEPIYIHQSKDDIEVEIALQYNSGYATNLLTYANNIHTYEGGIQ 

35 

Sequence 2637 

Con t ig_0 7 8 8_pos_5 1 9 7_6 4 11 , 

is similar to (with p-value 3.0e-17) 

>pir:pir|I64093 1 164093 ribosomal protein S14 (rpS14) homolog 
40 - Haemophilus influenzae (strain Rd KW20) >gp : gp | [J32762 I U32 
762_5 Haemophilus influenzae Rd section 77 of 163 of the com 
plete genome. NID; gl573797. 

gtggatgatgtgacaaaatatggtccagttgatggagatccgatcacgtcaacggaagaa 
attccattcgacaagaaacgtgaattcaatcctgatttaaaaccaggtgaagagcgtgtt 

45 aaacaaaaaggtgaaccaggaacaaaaacaattacaacaccaacaactaagaacccatta 
acaggggaaaaagttggcgaaggtgaaccaacagaaaaaataacaaaacaaccagtagat 
gaaatcacagaatatggtggcgaagaaatcaagccaggccataaggatgaatttgatcca 
aatgcaccgaaaggtagccaagaggacgttccaggtaaaccaggagttaaaaaccctgat 
acaggcgaagtagtcacaccaccagtggatgatgtgacaaaatatggtccagttgatgga 

50 gatccgatcacgtcaacggaagaaattccattcgacaagaaacgtgaattcaatcctgat 
tte'aaaccaggtaaagagcgcgttaaacagaaaggtgaaccaggaacaaaaacaat taca 
acaccaa-jaactaagaacccattaacaggggaaaaagttggcgaaggtgaaccaa ..agaa 
aaagtaacaaaacaaccagtagatgaaatcacagaatatggtggcgaagaaatcaagcca 
ggccataaggatgaatttgatccaaatgcaccgaaaggtagccaagaggacgttccaggt 

55 aaaccaggagttaaaaatcctgatacaggcgaagtagttactccaccagtggatgatgtg 
acaaaatatggtccagttgatggagatccgattacgtcaacggaagaaattccgtttgat 
aaaaaacgcgaatttgatccaaacttagcgccaggtacagagaaagtcgttcaaaaaggt 
gaaccaggaacaaaaacaattacaacaccaacaactaagaaccctatggcgaagaaatct 
aaaatagcaaaagaacaaaaaagacaagaattagtaaataaatattacgagttacgtaaa 
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gaattaaaagcaaaaggggactatgaagcattaagaaagttgccaagagattcatctcca 
actagattaactagaagatgtaaagtaactggtagacctagaggtgtgttacgtaaattt 
gaaatgtctagaattgcatttagagaacatgcgcataaaggtcaaattccaggtgtaaaa 
aaatctagttggtaa 

5 

Sequence 2638 

VDDVTKYGPVDGDPITSTEEIPFDKKREFNPDLKPGEERVKQKGEPGTKTITTPTTKNPL 

TGEKVGEGEPTEKITKQPVDEITEYGGEEIKPGHKDEeDPNAPKGSQEDVPGKFGVKNPD 
TGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFNPDLKPGKERVKQKGEPGTKTIT 
10 TPTTKNPLTGEKVGEGEPTEKVTKQPVDEITEYGGEEIKPGHKDEFDPNAPKGSQEDVPG 
KPGVKNPDTGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFDPNLAPGTEKVVQKG 
EPGTKTITTPTTKNPMAKKSKIAKEQKRQELVNKYYELRKELKAKGDYEALRKLPRDSSP 
TRLTRRCKVTGRPRGVLRKFEMSRIAFREHAHKGQIPGVKKSSW* 

15 Sequence 2639 

Contig_0788_pos_5829_4 909, 

putative peptide of unknown function 

gtgatttcatctactggttgttttgttactttttctgttggttcaccttcgccaactttt 
tcccctgttaatgggttcttagttgttggtgttgtaattgtttttgttcctggttcacct 

20 ttctgtttaacgcgctctttacctggttttaaatcaggattgaattcacgtttcttgtcg 
aatggaatttcttccgttgacgtgatcggatctccatcaactggaccatattttgtcaca 
tcatccactggtggtgtgactacttcgcctgtatcagggtttttaactcctggtttacct 
ggaacgtcctcttggctacctttcggtgcatttggatcaaattcatccttatggcctggc 
ttgatttcttcgccaccatattctgtgatttcatctactggttgttttgttatttcttct 

25 gttggttcaccttcgccaactttttcccctgttaatgggttcttagttgttggtgttgta 
attgtttttgttcctggttcacctttttgtttaacacgctcttcacctggttttaaatca 
ggattgaattcacgtttcttgtcgaatggaatttcttccgttgacgtgatcggatctcca 
tcaactggaccatattttgtcacatcatccacaggtggagtaactacttcgcctgtatca 
ggatttttaaccccccggcttacctggttgcgttgtttgactacctttcggtgcatttgg 

30 atcaaattcatccttatggcctggcttgatttcttcgccaccataatgaacgatttcatc 
cactggttgttttgttattttttctgttggttcaccttcgccaactttttctcctgtatt 
aggattgacataagttggtgttgttgttgtttcaattcctggttcacctttttggactac 
tttttctgtacctggggctaa 

35 Sequence 2640 

VISSTGCFVTFSVGSPSPTFSPVNGFLWGWIVFVPGSPFCLTRSLPGFKSGLNSRFLS 
NGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPGLPGTSSWLPFGAFGSNSSLWPG 
LISSPPYSVISSTGCFVIFSVGSPSPTFSPVNGFLVVGWIVFVPGSPFCLTRSSPGFKS 
GLNSRFLSNGISSVDVIGSPSTGPYFVTSSTGGVTTSPVSGFLTPRLTWLRCLTTFRCIW 

40 IKFILMAWLDFFATIMNDFIHWLFCYFFCWFTFANFFSCIRIDISWCCCCFNSWFTFLDY 
FFCTWG* 

Sequence- 2641 
Contig_0788_pos_3383_2700, 

45 is similar to (with p-value 2.0e~16) 

>sp:spjP5417 6!HLY3_BACCE HEMOLYSIN III (HLY-III) . >pir:pir|S 
59967ISS9967 hemolysin III - Bacillus cereus >pir : pir | S52296 
IS52296 hemolytic factor - Bacillus cereus >gp:gp|X84058 I BCH 
MLYSN_1 B. cereus gene for hemolytic factor. NID: g662B79. 

50 atgtctcaatcatctaagcgaaaaaacgattctgtggtagaaacttttaaggacatcatt 
cctttaacatttggagaagagattggtaatgcagcatcacatggtgctgctgcattactt 
acattatttattcttccatatgcagcagtgcatagttttaacaatggtggcacattagag 
tcaattagtgtgtcagtttatgtgattagtatttttatgatgttcatatcatcaaccatt 
tatcattcaatgcaaaataatacgtctcataaatatatattaaggattattgaccatagt 

55 atgatttatgtggctatttcaggaacatacacacctgttttgttaagtgttgtcggcggt 
tggttaggttggctcgtgacaatattattatggggaacgacattatgggggattttgtat 
aaatcaatagcaactaaagtcaatcatagattaagtctcattgtttatttggtgatggga 
tgggtaggtatcatatttttacctattattattatgcgaacatcatggtggtttattttc 
tttatatttcttggtgggttatcttatactatcggagcatggttttatgcccaaaaaaat 
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aaaccttattttcatatgatttggcatatatttattgttcttgcttctttcttacatatg 
ataggcattttttattttatgtga 

Sequence 2642 

5 MSQSSKRKNDSVVETFKDIIPLTFGEEIGNAASHGAAALLTLFILPYAAVHSFNNGGTLE 
SISVSVYVISIFMMFISSTIYHSMQNNTSHKYILRIIDHSMIYVAISGTYTPVLLSVVGG 
WLGWLVTILLWGTTLWGILYKSIATKVNHRLSLIVYLVMGWVGIIFLPIIIMRTSWWFIF 
FIFLGGLSYTIGAWFYAQKNKPYFHMIWHIFIVLASFLHMIGIFYFM* 

10 Sequence 2643 

Cont i g_0 7 8 8_pos_ 2461_1124, 

is similar to (with p-value 7.0e-20) 

>gp:gp|D50098|D50098_l Bacillus subtilis macronuclear DNA fo 
r multidrug transporter, complete cds . NID: gl856976. 

15 atga i.tcvtaagtctatcattactgtaatggcactcatactaataatgtttatgg' agct 
atagaaacatctattatttcattagcattaccaacaataaaaaatagtttgaatgccggt 
aatctagtttcattagtatttaccgtatattttattgctttagtcatagctaaccctatc 
gttggtgaacttatgtctagatttaaaattatttacattgctgttgtaggggtattattg 
tttgccttaggtagtttaatgtcgggattaagtcagacgtttacttttttaattatctct 

20 cgaacagtacaaggttttggagcaggagttatgatgtcactctcacaaatagttcctaag 
ttggcttttgaaattcctttgagatataaaattatgggtatagttggaagtgtttgggga 
atttcgagtattattggcccattattaggtggtgcgattttagagtttgcttcatggcat 
tggctattctatatcaatattcctattgctatagtggcaataatacttgtacttatgact 
tttcattttcctgatgagacacaagtacaacagagtcgttttgatataaaaggattgatt 

25 atcttttatatctttatagctttattaatgtttggtttactcaaccaacatcatattatt 
tttaacatattctcaattattttagctttagctgttttatggctactatttaaaatagaa 
aatagtatcgaacaaccatttcttccaacaaaagaatttaacatatcaatagttctagtt 
tttataacggatttacttattgcgataacactgatgggatataatttatatataccagta 
tatttacaagaaaaacttagtttatcacctttacaaagtggatttgtaatattcccgttg 

30 tctgttgcttggattacgcttaatttcaatttagctaaaatagaagcgcattttactaga 
aaaacattatatatttgctcattttttgttttattagttagtagtctgatgataatgttt 
ggcttaaaactcccattgcttattgcttttgcggttgtttttgcaggtttaagttttggt 
tatatttatacaaaagatagtgttattgtccaagaggaaacttctccaaaaaatatgaaa 
aagatgatgtcattttatgcattgacaaaaaatttaggttcgtcagtcggatctacgatt 

35 atgggctatatgtatgcactaaatgttggtttatttggttctaatttacacaatgtatta 
ggattagtcttaataattgcagtatgtttaattgtaatgtggatgacattatataaaagc 
aatactattcaatcttag 

Sequence 2644 

40 MNLKSIITVMALILIMF^4AAIETSIISLALPTIKNSLNAGNLVSLVFTVYFIALVIANPI 
VGELMSRFKIIYIAVVGVLLFALGSLMSGLSQTFTFLIISRTVQGFGAGVMMSLSQIVPK 
LAFEIPLRYKIMGIVGSVWGISSIIGPLLGGAILEFASWHWLFYINIPIAIVAIILVLMT 
FHFPDETQVQQSRFDIKGLIIFYIFIALLMFGLLNQHHIIFNIFSIILALAVLWLLFKIE 
NSIEQPFLPTKEFNISrVLVFITDLLIAITLMGYNLYIPVYLQEKLSLSPLQSGFVIFPL 

45 SVAWITLNFNLAKIEAHFTRKTLYICSFFVLLVSSLMIMFGLKLPLLIAFAVVFAGLSFG 
YIYTKDSVIVQEETSPKNMKKMMSFYALTKNLGSSVGSTIMGYMYALNVGLFGSNLHNVL 
GLVLIIAVCLIVMWMTLYKSNTIQS* 

Sequence 2645 
50 Contig_0795_pos_9882_9001, 

putative peptide of unknown function 

atgccaagttcaatgttgcatgagcatggatggagcggtatcaatttcaattcaccattt 
gcagatatggcaattttattaggtcttaactggttagcaatattactttatatggaagca 
gtt.gtit^.accgtttggtactggagtttcttttgttgccgttactggacgtgtgttacgc 
55 gctatggdagaaaatgggcatattcctaaattcttaggtaaaattaataaaaagtc.taat 
atcccacgtgttgccattgcatttaatgcaattatcagcatgattatggtgacattgttc 
cgtgactggggtacactagctgcggttatttctactgcaacattagttgcatatttaact 
ggtccaactacggttatttcattacgtaaaatggcaccaaaaatgactcgtccatttaaa 
gctaatattttaaaatttatggcacctttatcctttgttttagcatcattagctatctat 
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tgggcaatgtggccaacaacagcagaagttattttaattattattttaggtttacctatt 
tatttcttctatgaatataaaatgaattggaagaatactaaaaaacaaattggcggaagc 
ttatggattattatctatcttattgttctcgcattcttatcatttattggaagtaaagag 
ttcaaaggcttaaattggattcactatccatgggatttcttagtcattgtaatcgttgct 
5 ttaatcttctatcaactaggtacgacaagttactttgaaagtatttatttcaagcgtgcg 
aacaagttgaataagaaaatgggcgataagttgcgtaaaacacgcaaaaaagcgcqtcat 
aaagatrggaaagaacgcgatcgacaagagcaaaatcaataa 

Sequence 2 64 6 

10 MPSSMLHEHGWSGINFNSPFADMAILLGLNWLAILLYMEAVVSPFGTGVSFVAVTGRVLR 
AMEENGHIPKFLGKINKKYNIPRVAIAFNAIISMIMVTLFRDWGTLAAVISTATLVAYLT 
GPTTVISLRKMAPKMTRPFKANILKFMAPLSFVLASLAIYWAMWPTTAEVILIIILGLPI 
YFFYEYKMNWKNTKKQIGGSLWIIIYLIVLAFLSFIGSKEFKGLNWIHYPWDFLVIVIVA 
LIFYQLGTTSYFESIYFKRANKLNKKMGDKLRKTRKKARHKDWKERDRQEQNQ* 

15 

Sequence 2 647 

Cont ig__07 95_pos_8 6 9 6_7 32 9 , 

is similar to (with p-value 2.0e-47) 

>gp:gp| U46134 I BSU46134_3 Bacillus subtilis putative orfl unk 

20 nown protein, putative transcriptional regulator (sir), and 
intracellular esterase B (estB) genes, complete cds. NID: gl 
762123. >gp:gp|Z99121lBSUB0018_125 Bacillus subtilis complet 
e genome {section 18 of 21): from 3399551 to 3609060. NID: g 
2635827. >gp : gp | Z7 1928 1 BSYVEFGNS_1 B. subtilis pnbA, sigL, yv 

25 e(J,K,L,M,N,0,P,Q,R,S,T] and yvf [A, B, C, D, E, F, G, H] genes. NID 
: gl495?76. >gp: gp| Z94043 | BSZ94043_47 B. subtilis genomic DNA 

fragment (88 kb) . NID: gl945641. 
gtgtgttcaaacatggttcaagtcaagataggtaactgtaccatcaatggcttacacaaa 
aagaatattgatgtattcttaggtatcccgtatgccaaaccattcaataagatatctcga 

30 ttccaacattcaaagtttatggaactaagcaaaccaatgattgatgcaactcatattcaa 
tccatcccaccacaaccctacaattcacttgaagactttttttcgatgacagattcatcg 
tttaattcttttaaacaaaatgattattgcctgtttttaaatatttggaaaccatcgtcc 
aatcaaaatcatttacctgtagtgatttacttttatggtggtagttttcttcaaggacat 
ggcacagctgaactatattgtcctgaacacatagtagaacaagaaaatataatagtagtt 

35 acttttaattatcgcttaggtgcactcggctacctagattggtcttattttaatcaacat 
ttgaactataataatgggatttctgatcaaattaacgttttaagatgggtacatcaatat 
atcgaacattttggcggtgattcaaataacgtgacactaatgggtcaatctgcaggtagt 
atgagtatcatgacattaatgcaaatgcccgaacttgatgattattatcataaagtgatg 
ttattaagtggaacgttaactactgatacaccactcaatgcacatactaaagtacaacat 

40 ttttcacaactcatgaggcattattttcctaataaaacacttaagacacttaccagtgat 
gacattttatatctaatggagtctcaaaaaatagagcgtggaagatctcgtggacttgat 
ttgatttatcaacctattaaagatcatcatatgtcacgatccattaaaaaatttcccaaa 
ccgacattcatgagttatacacacgatgaaggtgatatttatattgaagacgcaacacgc 
accttaccttctgaacgttttattcacttgatgtctcaatatggtacacacgtcgaaaaa 

45 aatgatgccctcacaatgaaacaacaaagaaatttaataacagagtattgttttgt tcgt 
ccaatttatctatttttaaatcaaatgaatagttgcgacacttggctagcacgttt tgat 
tggcaccaaccccatacctcctactttaaaagtgcatatcatatattggatttagtattt 
tggtttggtcacctctctattttgactaaaaatcattattctataactcaacatgatatg 
aatttaagtagtaacatgatatctgacttagcttattttgcccgaaaaggtaagatgcca 

50 tggaaatgttatgaacctcaacatcaagcgttacatatctatggataa 

Sequence 2648 

VCSNMVQVKIGNCTINGLHKKNIDVFLGIPYAKPFNKISRFQHSKFMELSKPMIDATHIQ 
SIPPQPYNSLEDFFSMTDSSFN5FKQNDYCLFLNIWKPSSNQNHLPVVIYFYGGSFLQGH 
55 GTAELYCPEHIVEQENIIVVTFNYRLGALGYLDWSYFNQHLNYNNGISDQINVLRWVHQY 
lEHFGGDSNNVTLMGQSAGSMSIMTLMQMPELDDYYHECVMLLSGTLTTDTPLNAHTKVQH 
FSQLMRHYFPNKTLKTLTSDDILYLMESQKIERGRSRGLDLIYQPIKDHHMSRSIKKFPK 
PTFMSYTHDEGDIYIEDATRTLPSERFIHLMSQYGTHVEKNDALTMKQQRNUTEYCFVR 
PIYLFLNQMNSCDTWLARFDWHQPHTSYFKSAYHILDLVFWFGHLSILTKNHYSITQHDM 
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NLSSNMISDLAYFARKGKMPWKCYEPQHQALHIYG* 

Sequence 264 9 

Contig_07 95_pos_2 61 9__214 3 , 
5 is similar to (with p-value 4.0e-68) 

>gp:gp|Z99108|BSUB0005_72 Bacillus subtilis complete genome 
(section 5 of 21): from 802821 to 1011250. NID: g2633065. >g 
p:gp|D78509|D78509_8 Bacillus subtilis YfjG-YfjR genes, comp 
lete cds. NID: g2780390. 

10 atggaagacgtgacagatattgtctttcggcatgttgtcagtgaagctgcgagaccagat 
gtattttttactgaatttaccaatactgagagttactgtcaccctgaaggtattcatagt 
gtgcgcggacgcttaacttttagtgacgacgaacaaccaatggtagcgcacatctggggc 
gataaaccagaacaattccgagaaatgagtatcggcttagcggatatgggttttaaaggt 
atagatttaaatatgggttgccctgtcgcaaacgttgcgaaaaaaggtaaaggatccggc 

15 ttaattctacgacctgaaacggcagccgaaatcattcaagcttctaaagcaggtggtcta 
ccggtcagtgtaaaaacacgtttaggttattacgatatcgatgaatggcgagactggtta 
aaacacgtcttcgaacaagttaggtgcgcgctctggtttaatggcagagccaaatga 

Sequence 2650 

20 MEDVTDIVFRHVVSEAARPDVFFTEFTNTESYCHPEGIHSVRGRLTFSDDEQPMVAHIWG 
DKPEQFREMSIGLADMGFKGIDLNMGCPVANVAKKGKGSGLILRPETAAEIIQASKAGGL 
PVSVKTRLGYYDIDEWRDWLKHVFEQVRCALWFNGRAK* 

Sequence 2651 
25 Contig_0795_pos_2063_660, 

is similar to (with p-value O.Oe+00) 

>gp:gp| AF029225 I AF029225_1 Staphylococcus carnosus NarG, Nar 
H, Nar J, and Narl genes, complete cds. NID: g3929521. 

atgactgcgacgccattatattcagatatcgttttacctgctgcaacttggtatgaaaaa 

30 catgatttatcttctacagacatgcatccatttattcatccatttaacccagcgattgac 
ccattatgggaatcgcgttcggactgggatatttataaaactctaagtaaagctgtttca 
gaaatggcgaaagattatcttccaggtaaatttaaagatgtcgtaactacaccattagga 
catgattcaaaacaagaaatttcaactgaatacggtattgtaaaagattggtctaaagga 
gaaattgaaggtgtgccaggtaaaacaatgcctaatttttctatcgtagagcgagactat 

35 acacaaatttacgataaattcgttactgttggtccaaaactagaaaaagggaaaataggt 
gctcatggtgtgagttatagcgttagtgaagagtacgaagaacttaaaagtatagttgga 
ac^:tcgaatgatgataatactatttcagttaaaaatgatagaccgagaatagatacagcg 
agaaaac'cagcagatgtcattttgaatatatcctctgctacaaacggcaaattatcacaa 
aagtcatatgaagatttagaaaatcaaacaggtatggaacttaaagatatttctaaagaa 

40 cgtgcttctgaaaagatatcattcttaaacattacttctcaaccaagagaagtgattcca 
actgcagtattccctggctctaataaagatggaagacgctactcaccgtttacaactaat 
gttgaacgtttagtgccatttagaacactaactggacgtcaaagttattatatagatcat 
gaggtattccaacagtttggcgaaagtttaccggtatataaacctactttacctccaatg 
gtatttggtgctcgtgataaaaaagttaaaggtggacaagatacattagtgcttcgatac 

45 cttacacctcatggaaaatggaatattcattcaacttatcaagataatgaacgcatgttg 
acgttgtttagaggtggaccagttgtatggatttcaaatgaagacgcagctgaccatggt 
attaatgataacgactggttagaagtatacaacagaaacggagttgttactgccagagct 
gtaacatctcatcgtatgcctagaggcacaatgtttatgtatcatgcacaagataaacat 
atagagacacctggttctgaaattactgatactcgtggaggttctcataatgcacctact 

50 cgtattcacttgaaacctactcaattagtaggaggatatgcacaaattagttatcacttt 
aactattatggaccaattggaaatcaaagagatgagtatgtagctgttagaaaaatgaag 
gaggtcaattggcttgaagattaa 

Sequence 2652 

55 MTATPLYSDIVLPAATWYEKHDLSSTDMHPFIHPFNPAIDPLWESRSDWDIYKTLSKAVS 
EMAKDYLPGKFKDVVTTPLGHDSKQEISTEYGIVKDWSKGEXEGVPGKTMPNFSIVERDY 
TQ: YO'KFVTVGPKLEKGKIGAHGVSYSVSEEYEELKSIVGTWNDDNTISVKNDRPMDTA 
RKVADVILNISSATNGKLSQKSYEDLENQTGMELKDISKERASEKISFLNITSQPi<EVIP 
TAVFPGSNKDGRRYSPFTTNVERLVPFRTLTGRQSYYIDHEVFQQFGESLPVYKPTLPPM 
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VFGARDKKVKGGQDTLVLRYLTPHGKWNIHSTYQDNERMLTLFRGGPVVWISNEDAADHG 
INDNDWLEVYNRNGVVTARAVTSHRMPRGTMFMYHAQDKHIETPGSEITDTRGGSHNAPT 
RIHLKPTQLVGGYAQISYHFNYYGPIGNQRDEYVAVRKMKEVNWLED* 

5 Sequence 2653 

Contig_0795_pos_0_64 6, 

is similar to (with p-value O.Oe+00) 

>gp:gp| AF029225 I AF029225_2 Staphylococcus carnosus NarG, Nar 
H, Np.rJ, and Narl genes, complete cds . NID: g3929521, 

10 atggtat tgaatctagacaaatgtattggttgtcatacttgcagtgtgacatgta':*aaac 
acatggacaaatcgacctggtgcagaatatatgtggtttaataacgtagaaacaaaaccg 
ggtgtaggatatccaaaaagatgggaagaccaaggacaatataaaggtggttgggtgcta 
aataaaaaaggaaagcttgaattaaaatctggtaacagatggtcaaaaattgctttaggt 
aaaatcttctataatccagacatgccactcattcaagattattatgaaccgtggacatat 

15 aactatgaacacttaaccaatgctaaacaaggacagcactctcccgtggcgacagctcac 
tctttaatttcaggtgatagattgaatcttaaatgggggccaaactgggaagatgattta 
gctggaggtcacattacaggaccagaggatccaaatattcagaaaatagaagaagatatt 
aaattccaattcgatgagacatttatgatgtatttaccaagactatgtgaacactgttta 
aatccaagttgcgtagcatcttgtccatcaggagctatgtataaacgagatgaggatggt 

20 atcgtactcgtcgatcaagaagcctgtcgaggttggagatactgta 

Sequence 2654 

MVLNLDKCIGCHTCSVTCKNTWTNRPGAEYMWFNNVETKPGVGYPKRWEDQGQYKGGWVL 
NKKGKLELKSGNRWSKIALGKIFYNPDMPLIQDYYEPWTYNYEHLTNAKQGQHSPVATAH 
25 SLISGDRLNLKWGPNWEDDLAGGHITGPEDPNIQKIEEDIKFQFDETFMMYLPRLCEHCL 
NPSCVASCPSGAMYKRDEDGIVLVDQEACRGWRYCX 

Sequence 2655 

Con c ;. 3 0 7 9 6_pos_l 62 5_22 4 8 , 

30 putative peptide of unknown function 

atggattatgtatacacaatttataaaaatcctagatataacatcattcaaaaagataat 
cgctatttaatggtcgatttagaacaaaattggtattcatatttatgtccaatgctgaat 
tggtttatacctattaaattcacagaattaacttatcaagaattcaataatataaacata 
tttcataatggaggacaaaagagtcatggtatgatggctgctggcgttggcgtcactatc 

35 agtgtgctattaagaagtcttgtgggttatatagatattaatattagtcgaatttggata 
gtttttatgtttttaattggatttgttgctgtgatcacacttcgtttatctataagaaag 
aagttaaatcatccagcatttaataaaaagagtaaacaaaaagtaatattgataccatca 
tttaaaaatatgatattggtggtgttttgctattttatgatgctgtttttctcaattgca 
ccttttcaaatgatttttgaggaaaagaaaaacatcttaggatatatactttgggtaggt 

40 gtattatttatatttactactttgaatatggcttcaatttctgatagaaaagtacatgcc 
aaaattaaaaatataagaagatag 

Sequence 2656 

MDYVYTIYKNPRYNIIQKDNRYLMVDLEQNWYSYLCPMLNWFIPIKFTELTYQEFNNINI 
45 FHNGGQKSHGMMAAGVGVTISVLLRSLVGYIDINISRIWIVFMFLIGFVAVITLRLSIRK 
KLNHPAFNKKSKQKVILIPSFKNMILVVFCYFMMLFFSIAPFQMIFEEKKNILGYILWVG 
VLFIFTTLNMASISDRKVHAKIKNIRR* 

SeqactCiCf- 2657 
50 Contig_07 96_pos_4 1 31_4 508 , 

putative peptide of unknown function 

atgcaattttattatagtaattgtgaacagaatgcatcaaacttattagtcatcaacgta 
caacctaatgaaggattttctttatgtgtgaatggtaagaaaagtaatcaaaataatgaa 
atgcaaaaagtgaagctttcttatactatgccgattaaagataaaatgaacacagttgat 
55 gcatatgaaaatcttatttacgatacattaattggagaacaaacaaaatttacgcattgg 
gaagaattaaaaattcttggaaatttattgatgatattgaaaatgtatggaaacaagaat 
agccacagtttcctaattatgcctttggatgctatgggcctaaagaaagtgaaaaattac 
ttagtgaagacggattga 
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Seqaenc-i 2658 

MQFYYSNCEQNASNLLVINVQPNEGFSLCVNGKKSNQNNEMQKVKLSYTMPIKDKMNTVD 
AYENLIYDTLIGEQTKFTHWEELKILGNLLMILKMYGNKNSHSFLIMPLDAMGLKKVKNY 
LVKTD* 

5 

Sequence 2659 

Con t ig_0 7 9 6_po s_4 6 2 7_0 , 

is similar to (with p-value 6.0e-53) 

>gp;gp| L76359I STMDRRC_1 Streptorayces peucetius daunorubicin 
10 resistance protein (drrC) gene, complete cds. NID: gll96906. 
atggattttattaatattacaggtgcttcacaaaataacttgaaaaacatagatgtaaat 
atcccaaaacacttagtaacggtatttacaggtcgttctggttcagggaaatcatcttta 
gtgtttaatactgttgctgcggagtctgaacagctactaaatgaaagttattctagttat 
attcaatttcatttaaatcaacaacccagaccgaaagtaaagaaaattaaaaatcttcct 
15 gtagcaatgacgattaatcagaaaagattcaatgggaattctcgctccacggtagga a ca 
gtttcagatatatatgcttctgttagattactgtggtctagaataggcgaaccgtttgtt 
ggttattcagatgcatattccttcaatagtcctaagggcatgtgtaaaacttgtgaggga 
ttaggatatattgaagacattaacttagatgaattgctagattgggataagtctttaaat 
gaaggtgcaatagactttccttcttttggaccagacaaagagcgtggtaaagcctatcga 
20 gataqt 

Sequence 2 660 

MDFINITGASQNNLKNIDVNIPKHLVTVFTGRSGSGKSSLVFNTVAAESEQLLNESYSSY 
IQFHLNQQPRPKVKKIKNLPVAMTINQKRFNGNSRSTVGTVSDIYASVRLLWSRIGEPFV 
25 GYSDAYSFNSPKGMCKTCEGLGYIEDINLDELLDWDKSLNEGAIDFPSFGPDKERGKAYR 
DS 

Sequence 2661 

Cont ig_0 7 9 6_pos_7 2 8_3 4 2 , 

30 putative peptide of unknown function 

atggctattgtaaataaggttattattgttgaaggtaaatcggataagaaaagagtacaa 
caagtaatcgctgaacctgcaaatatcatttgtacacatggcactatgagtatagataag 
atagacaacatgatagaaacactttatgacaaacaagtttatgttcttgccgattctgat 
gatgagggtgaaaaaattagaaaatggtttaaacgttatttaagcgaaagtgaacatatt 

35 tatgttgataaaacgttttgtgaggttgctaagtgtcctaaaaattatttagcacatgta 
ttaagtagatatggttttaatgtaaaaaaagaaaagaaacttatgaataatttaaaaact 
gaaaggctagttttagtaaatgaataa 

Sequence 2662 

40 ma:^.vnkv7ivegksdkkrvqqviaepaniicthgtmsidkidnmietlydkqvyvl;adsd 
degekipj^wfkrylsesehiyvdktfcevakcpknylahvlsrygfnvkkekklmwnlkt 

ERLVLVNE* 

Sequence 2663 
45 Contig_0798_pos_54 28_4553, 

putative peptide of unknown function 

atgatattgaattcaaaagttaaaggtattattgctatattgatttcagctgtgggtttt 
agttttatgtcagtcttttttagattggccggtgatttaccagtctttcaaaaatctcta 
gctagaaattttgtagccatgtttataccattattttttatttataaatataggcaacct 

50 atgtttggaaaattaagtagtcaacccctactcatctcacgttcaacacttgggttaatt 
ggtgtcttacttaatatctacgcaattgatcacatggtattaagtgatgctgatacatta 
atgaaattaaatcctttttggacaattgttcttagtttaatttttttacatgaaaaggta 
cgaaaatatcaaatcacggcgatgattattgctataatagggatgctattaattgttaaa 
ccagaattttcatcatcagttattccttcaatagcaggattactatccggtatttttgca 

55 gcttctgcctacacatgtgttagagcactcagcactcgtgaaaaaccttatacgatagtg 
ttttatttttcattattctcagttgtagttcttatacctttttcaatatttacttataca 
cctatgacaacaattcaaattcttttcttactcggcgctggattatcagcagctgtagga 
caaattggtataacattggcttatagttttgctccagcaaaagatatctccatcttcaca 
tatgcgtctataatatt tact gcattatttggatt tat tctgtttggagaatcacctgat 
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atgtttgcaacagtaggatatattgtcattatcggagcaagttactatatgtttgataaa 
gcaagacgtgaaacaactataaatcaaaataattaa 

Sequence 2664 

5 MILNSKVKGIIAILISAVGFSFMSVFFRLAGDLPVFQKSLARNFVAMFIPLFFIYKYRQP 
MFGKLSSQPLLISRSTLGLIGVLLNIYAIDHMVLSDADTLMKLNPFWTIVLSLIFLHEKV 
RKYQI TAMI I AI I GMLLI VKPEFSSS VI PS I AGLLSGI FAASAYTCVRALSTREKPYT I V 
FYFSLFSVWLIPFSIFTYTPMTTIQILFLLGAGLSAAVGQIGITLAYSFAPAKDISIFT 
YASIIFTALFGFILFGESPDMFATVGYIVIIGASYYMFDKARRETTINQNN* 

10 

Sequence 2665 

Con t i g 0 7 9 8_pos_3 6 2 9_2 250, 

is similar to (with p-value O.Oe+00) 

>sp:sp|P94 408| YCLF_BACSU HYPOTHETICAL 53.3 KD PROTEIN IN SFP 

15 -GERKA INTERGENIC REGION. >gp : gp | Z99106 | BSUB0003_1 5 Bacillus 
subtilis complete genome (section 3 of 21) : from 402751 to 
611850. NID: g2632653. >gp:gp I D50453 I D50453_69 Bacillus subt 
ills DNA for 25-36 degree region containing the amyE-srfA re 
gion, complete cds. NID: gl805369. 

20 atgagagcaatgcttattttttatatgtattttgcactccaggaaaacggacttggaatg 
gacaaaacaacagccatgtctatcatgtctgtatatggatcattaatttatatgtcttca 
attccaggtggctggatagctgatagaattacaggtacccgaggtgccacattaatcgga 
gctatattaattattataggtcacatatgtttaagccttccttttgcaatggtaggttta 
ttcacttctatgttctttattattgtaggatcaggtttaatgaaaccaaacatttcaaat 

25 attgttggtagactctatccagaaaacgacgtacgcatggatgctggatttgttatcttc 
tatatgtcagtgaacatgggtgcactcgtttcaccaattattttacaacactatattgat 
attagaaatttccatggcggcttcttgattgcagcaatagggatggctcttggattagtt 
tggtacttactatttaaccgaaaaactttgggtagtatcggtatgaaaccgacaaaccca 
ttatcttcttctgaaaagaaaaagtacggaacaatcatcggaatcgttgttatagcaata 

30 gtattaatccttatgattgcttactttacgcatacgctatcatttaatttaatcagtaat 
actgttttaattttaggtattgctttaccaatcatttactttacaacaatgatt agaagt 
aaag3a9*.tactgatactgaaagatcaagagtaaaagcatttattccgttattca\- ttta 
ggcatgttattttggtcaattcaagaacaaggatctaatgtattaaatatctatggtatt 
gaaaactctgatatgaaattaaatttatttggttggaaaacacattttggtgaagctatt 

35 ttccaaaccattaacccattatttattttattatttgcacccgtggttactcttttatgg 
caaaagctaggaaagaaacaacctagcctacctattaagtttgcaattggtactatttta 
gcaggcgcatcctacatacttatgggagcaatcggtcatatttatggggatacacaattc 
tcagttaactgggttattctttcatacgttatctgtgttattggtgagctttgtctctct 
ccaactggtagtagtgcagcagttaaattagcacctaaggcatttaacgcacaaatgatg 

40 agcctttggttattaactaacgcttcagctcaggccattaacggtacattagttaaatta 
attaaaccacttggtcaaaccaattactttatcttcttaggtgttgttgcaaccgtgatt 
acgttaattatattagcgtttattcctaagatttctaaagcaatgaaaggtattcgttaa 



45 Sequence 2 666 

MRAMLIFYMYFALQENGLGMDKTTAMSIMSVYGSLIYMSSIPGGWIADRITGTRGATLIG 
AILIIIGHICLSLPFAMVGLFTSMFFIIVGSGLMKPNISNIVGRLYPENDVRMDAGFVIF 
YMSVNMGALVSPIILQHYIDIRNFHGGFLIAAIGMALGLVWYLLFNRKTLGSIGMKPTNP 
LSSSEKKKYGTIIGIWIAIVLILMIAYFTHTLSFNLISNTVLILGIALPIIYFTTMIRS 

50 KEVTDTERSRVECAFIPLFILGMLFWSIQEQGSNVLNIYGIENSDMKLNLFGWKTHFGEAI 
FQTINPLFILLFAPWTLLWQKLGKKQPSLPIKFAIGTILAGASYILMGAIGHIYGOTQF 
SVNWVI1.GYVICVIGELCLSPTGSSAAVKLAPKAFNAQMMSLWLLTNASAQAINGVLVKL 
I KPLGQTNY F I FLG VVATVI TLI ILAFI PKI SKAMKG I R* 

55 Sequence 2667 

Contig_0798_pos_1595_672, 

is similar to (with p-value 7.0e-45) 

>sp:sp|P3907 4|BMRU_BACSU BMRU PROTEIN. >gp : gp I L2 5604 | BACBMRU 
RBE_1 Bacillus subtilis bmrU, multidrug efflux transporter ( 
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bmr) and its regulator (bmrR) genes, complete cds, and branc 
hed-chain 2-oxo acid dehydrogenase (bfiuB) gene, 3* end. NID: 
g2558636. >gp: gp | D84 432 | BACJH64 2_251 Bacillus subtilis J)NA, 
283 Kb region containing skin element. NID: g2627063. 'gp:g 
5 pi Z99116|BSUB0013_111 Bacillus subtilis complete genome (sec 
tion 13 of 21): from 2395261 to 2613730. NID: g2634723. 
atgaaacaaccgtataaccatggtgttcttttctatcatgaacatagcggtttgaaagat 
atacataatggcataggagaagttgcaaaatctcttagttcaatgtgtaaacacctctct 
cttcaactcagtgaaaataaaggcgatattattaaatattgtaaatctattaaaaatgaa 

10 aattatagctctgatgtagacgttttatttattttaggtggagatggtacacttaatgaa 
ctagtaaatggcgttatgcagtatcagttaaatttaccaatcggtgtaataccaggtggt 
acctttaacgattttacaaaaacacttcaactgcaccctaattttaaaacagctagtgag 
caattattaacatcacatgctgaatcatatgatgtgttaaaagtgaacgacttatatgta 
cttaatttcgttggacttggcttaatagtacaaaatgcagagaatgttcaagatggttct 

15 aaagatatattcggtaaattcagctatattggatcaaccgttaaaacgttattaaatcct 
gttaaatttgatttctcattgactgttgatggtgaaacaaaagaaggcaatacttcgatg 
atgttaatagcaaacggtcccaatataggtggtggacaaattccgctaaccgatttatcg 
ccacaagatggaagagcaaacacatttgtatttaatgatcaaacactaaatatattgaat 
gatatattaaaaaaacgtgatagtatgaattggaacgaaatcacacaaggtattgatcac 

20 atatcaggtaagcacatcacactctcaacaaaccctagtatgaaagtggatattgatggc 
gaaattaatttagaaacaccaattgagattcaagtattacccaaagcgatacaacttctt 
actgcaactgaacaaaataattaa 

Sequence 2668 

25 MKQPYNHGVLFYHEHSGLKDIHNGIGEVAKSLSSMCKHLSLQLSENKGDIIKYCKSIKNE 
NYSSDVDVLFILGGDGTLNELVNGVMQYQLNLPIGVIPGGTFNDFTKTLQLHPNFKTASE 
QLLTSHAESYDVLKVNDLYVLNFVGLGLIVQNAENVQDGSKDIFGKFSYIGSTVKTLLNP 
VKFDFSLTVDGETKEGNTSMMLIANGPNIGGGQIPLTDLSPQDGRANTFVFNDQTLNILN 
DILKKRDSMNWNEITQGIDHISGKHITLSTNPSMKVDIDGEINLETPIEIQVLPKAIQLL 

30 TATEQNN* 

Sequence 2669 
Contig_0798_pos_0_315, 
putative peptide of unknown function 
35 atgtttgtgaattatttcacaatatctaaggagtggttgtatatgttatctgtaactaaa 

aaaaatacatatgaatcaaacaaagatgaagtcacacaaatgattgattcattagcagaa 
aaaggacaagaagctctaaaagaactatctaaaaaatcacaacatgagattaatgacatt 
gtacatcagatgagcatggctgctgttgatcagcatatgcatttagctaaactagcttac 
gacgaaacaggtagaggtatttatgaagacaaagctatcaaaaatttatatgcctcagag 
40 tacatatggaattcA 

Sequence 2670 

MFVNYFT7SKEWLYMLSVTKKNTYESNKDEVTQMIDSLAEKGQEALKELSKKSQHEINDI 
VHQMSMAAVDQHMHLAKLAYDETGRGIYEDKAIKNLYASEYIWNS 

45 

Sequence 2671 

Con t i g_0 7 9 9_pos_5 4 5 8_5 027, 

putative peptide of unknown function 

atgttcaaaaatatattattaccctatgatttcgaaaatgattttagtgctatccctgac 
50 tatttagaaaaagtcaccgatgaagattcagttgttgtaatttatcacgttgtaacagaa 
aatgatcttgcaattagtgtcaagtattataataagcataaagaagatattattagagaa 
aaagagaaaaaactcactccatttttacgtgaattagaaaaaagagatattcaatataaa 
atagatgtagattttgggcatattaaagatacaatcttagaaaaaattacttctggagat 
ataaataatggtgaatttgatttagtaattatgagtaatcatagagtcgatttgaatatt 
55 aaacatgttttaggagatgttacacataagattgctaaaagaagttctgtcccagtacta 
attgttaaataa 

Sequence 2672 

MFKNILLPYDFENDFSAIPDYLEKVTDEDSVVVIYHWTENDLAISVKYYNKHKEDIIRE 
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KEKKLTPFLRELEKRDIQYKIDVDFGHIKDTILEKITSGDINNGEFDLVIMSNHRVDLNI 
KHVLGDVTHKIAKRSSVPVLIVK* 

Sequence 2673 
5 Contig_07 99_pos_2591_1617, 

putative peptide of unknown function 

atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 
gcttgtt taggaccgacgctcaaacaaacaggcagcttacctatacatgagttaatattc 

10 tttgaattaagagaacgtgcccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaattttctttatgtttttctatttttactgagagaaataaagtggtgacacaagattta 
tatgactctatttctagagcatattattgtctcgttgatacacaagctaatcaaaatatg 
attgaacactacgcaggattgaacatgaatgatattaatcttttaagggtaacgcctttt 
gatgcgaagtcattacctaaccaaagtagtcaattgtatgacacttatattggattatgg 

15 atagn tQtjtttggacgagattgaaatacgagagattgtaaacagcttatttcaata tatt 
caacataaagatggctataagttgaaaattttaactaagagtagagataatcttacggaa 
aatcttatagatgaagttgctcatctcaatgatttatatcaccaagagaaaaaggaaata 
agtgatgtaattgaagacgtgatacagaataaaaaagaaacaatcattgatattgaaaca 
gtaccgtttgaagaagatcttgtaagcgttatttcaaaattaagagttgtagtagattta 

20 tctttagagccgaaactttttttacaaatctgttgtattggcgcgggtataccacaaatt 
aataaaaagagaacagattatgttaaacatatgcataatggatatattattgatgacata 
tcgcaaactgtagaatctttagattattttttggcacatttaaaaaatggaattattctt 
atgcatattccatga 

25 Sequence 2 674 

MERFCCVNQINYIQMNPLEAKFKTSALRSWKTDQADAHKLACLGPTLKQTGSLPIHELIF 
FELRERARFHLEIENEQNRLKFSLCFSIFTERNKVVTQDLYDSISRAYYCLVDTQANQNM 
lEHYAGLNMNDINLLRVTPFDAKSLPNQSSQLYDTYIGLWIDGLDEIEIREIVNSLFQYI 
QHKDGYKLKILTKSRDNLTENLIDEVAHLNDLYHQEKKEISDVIEDVIQNKKETIIDIET 

30 VPFEEDLVSVISKLRVWDLSLEPKLFLQICCIGAGIPQINKKRTDYVKHMHNGYIIDDI 
SQT VESLDYFLAHLKNGI I LiXIH IP* 

Sequence 2675 
Contig_0799_pos_1108_701, 

35 putative peptide of unknown function 

atgttcccaccccgaacacctagtagagatgccactaacccacctcaacaactcttacaa 
ttactaggaataccgtcaaatattacccatctcacatacaatttttctgagcatgcatta 
ccttggataagttttatcgtacactatagtttttctatcgctattgcaataatctatatt 
tatatcgcaaagaaatatacaaaaatcacactaggttatggtgctttatttggtatagtt 

40 atttggattgtttttcatttaatcttaatgccaattatgcatgtcgtaccgaatgctttt 
gatcaaccattttcagaacacctatcagaattttttggacacattgtttggatgatggtt 
atagaaatggtcagaaggtatttctataatattcaattaaataaataa . 

Sequence 2676 

45 MFPPRTPSRDATNPPQQLLQLLGIPSNITHLTYNFSEHALPWISFIVHYSFSIAIAIIYI 
YIAKKYTKITLGYGALFGIVIWIVFHLILMPIMHWPNAFDQPFSEHLSEFFGHIVWMMV 
lEMVRRYFYNIQLNK* 

Sequence 2677 
50 Cont ig_08 0 0_pos_2 62 7_2 97 1 , 

putative peptide of unknown function 

gtgacaaaccggaggaaggtggggatgacgtcaaatcatcatgcccct tatgatttgggc 
tacacacgtgctacaatggacaatacaaagggcagcgaaaccgcgaggtcaagcaaatcc 
ca^i^^aattgttctcagttcggattgtagtctgcaactcgactatatgaagctggr. itcg 
55 ctagtac-ccgtagatcagcatgctacggtgaatacgttcccgggtcttgtacacancgcc 
cgtcacaccacgagagtttgtaacacccgaagccggtggagtaaccatttggagctagcc 
gtcgaaggtgggacaaatgattggggtgaagtcgtaacaaggtag 

Sequence 2678 
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VTNRRKVGMTSNHHAPYDLGYTRATMDNTKGSETARSSKSHKVVLSSDCSLQLDYMKLES 
LVIVDQHATVNTFPGLVHTARHTTRVCNTRSRWSNHLELAVEGGTNDWGEVVTR* 

Sequence 2679 
5 Contig_0802_pos_6032_5214, 

is similar to (with p- value O.Oe+00) 

>sp:£,p!?23966|MENB_BACSU NAPHTHOATE SYNTHASE {EC 4.1,3.36) ( 
DIHYDROXYNAPHTHOIC ACID SYNTHETASE) (DHNA SYNTHETASE). >pir: 
pir I A42715 I A42715 dihydroxynapthoic acid synthetase - Bacill 

10 us subtilis >gp:gp|M74 521IBACMENAQUI_6 Bacillus subtilis men 
aquinone operon, complete cds. NID: g557486. >gp:gp I M74538 I B 
ACMENAQ0P_4 Bacillus subtilis menaquinone operon: menF, menD 
, menB and menE genes, complete cds. NID: gll85287. 
atgactagacagtgggaaacacttagagaatatgatgaaattaaatatgaatttttcgaa 

15 gggattgccaaagtaacgattaatcgtccagaagtaagaaatgcatttactcctaaaaca 
gttgctgaaatgattgatgcattttcacgtgcgcgtgatgatcaaaatgtatcagtaatt 
gtattaactggtgaaggggacaaagcgttttgttcaggtggagatcaaaaaaaacgtgga 
cacggtggttatgtaggtgaagatgatattcctcgtttaaatgtattagatttacaacgt 
ttaattcgtgtgattcctaaaccagtaatagcaatggttagaggctatgcaattggtgga 

20 ggaaatgtacttaatgttgtttgtgatttaactatcgctgcagacaatgctatttttgga 
caaactggaccaaaagtaggctcatttgatgctgggtacggttctggctacctagctcgt 
atagttggccataaaaaagcaagagaaatctggtacttatgccgtcaatataatgcacag 
gaagctttggatatgggcttagtgaatactgtagttccattagaacaagttgaagacgaa 
acagt t aaa tggtgtaaagacatcatgcaacact caeca act get ttacgtttcttaaaa 

25 gcagcaatgaatgctgatactgatggtttagctggtttacaacaaatggctggagatgcg 
ac'-tt acVttactatactactgatgaagcgaaagaaggacgtgacgcgtttaaagaaaaa 
cgtaatcctgattttgaccaattccctaaattcccataa 

Sequence 2680 

30 MTRQWETLREYDEIKYEFFEGIAKVTINRPEVRNAFTPKTVAEMIDAFSRARDDQNVSVI 
VLTGEGDKAFCSGGDQKKRGHGGYVGEDDIPRLNVLDLQRLIRVIPKPVIAMVRGYAIGG 
GNVLNVVCDLTIAADNAIFGQTGPKVGSFDAGYGSGYLARIVGHKKAREIWYLCRQYNAQ 
EALDMGLVNTVVPLEQVEDETVKWCKDIMQHSPTALRFLKAAMNADTDGLAGLQQMAGDA 
TLLYYTTDEAKEGRDAFKBKRNPDFDQFPKFP* 

35 

Sequence 2681 

Contig_0802_pos_1667_123, 

is similar to (with p-value O.Oe+00) 

>sp:sp|P39634 |ROCA_BACSU l-PYRROLINE-5-CARBOXYLATE DEHYDROGE 
40 NASE (EC 1,5.1.12) (P5C DEHYDROGENASE). >pir : pir I S39731 I S397 
31 hypothetical protein - Bacillus subtilis >gp:gp| X73124 | BS 
GENR_77 B. subtilis genomic region (325 to 333). NID: g413923 
. >gp:gp|Z99123|BSUB0020_74 Bacillus subtilis complete genom 
e (section 20 of 21): from 3798401 to 4010550. NID: g2636240 

45 

atggtagtacctttcaaaaatgaacctggtattgatttttcagtacagacaaatgttgag 
cgttttaatgaagaattaaggaaagtaaaagcgcaactaggacaagatataccacttgtg 
attaacggagaaaaacttactaaaactgatacttttaattcagtgaatcctgcgaataca 
tcacagctcattgcgaaagtgtctaaagcaacgcaagatgatattgaaaaagctttcgaa 

50 tcagcaaatcatgcgtatcaatcatggaagaagtggtcgcataaggaccgtgcagaatta 
ctgttacgtgtagccgcaattatccgtcgtcgaaaagaggaaatttccgctattatggtt 
tatgaagccggcaagccttgggatgaagcagttggagatgcagctgagggtattgatttt 
atagaatattatgcaagatcaatgatggaacttgcagatggtaagccagtattagacaga 
gaaggtgaacataatcgctatttttataaacctattggtacaggcgtgacaattccacca 

55 tggaattttccatttgcaattatggctggtacaaccttagcccctgttgttgcaggtaac 
actgtattattaaagcctgctgaggatacagttttgactgcttataaattaatggaaata 
ttagaagaagcaggtttaccccaaggtgttgtaaattttgttcctggtgatccaaaagaa 
attggagattatttagtcgaccataaagatacacattttgtcacatttacaggatcccga 
gctacaggtacacgtatttatgaacgtagtgctgtagtgcaagaaggacaacagttttta 
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aaacgtgttattgcagagatgggtggcaaagatgcgatagttgtagataataatgtagat 
acagatttagcggctgaagcaattgttacatctgcttttggtttctctggtcaaaaatgc 

tctgcgtgttctcgtgtcatagtccatcaagacgtacatgatgaaatattggaaaaagca 
attcaattaactcaaaaattaactttaggtaatactgaagagaacacatttatggggcca 
5 gtaattaatcaaaaacaatttgataaaatcaaaaattatattgaaattggtaaaaaagaa 
ggcaaactagagactggtggtggaacagatgattctaccggttatttcattgaaccaacg 
attttctccggactacaatctgcggatcgtatcatgcaagaagaaatttttggaccagtc 
gtaggctttattaaggtcaaggattttgatgaggctattgaagtagctaatgatactgac 
tatggtttgacaggcgctgtaattactaatcatcgtgaacattggattaaggctgtgaat 
10 gaatttgatgtaggtaacctttacttgaatagaggttgtacagctgcagtagtgggttat 
catccatttggtggattcaagatgtctggtacagatgctaaaacaggaagtccagattac 
ttacttaatttcttagaacaaaaagttgtttctgaaatgttttaa 

Sequence 2682 

15 MVVPFKNEPGIDFSVQTNVERFNEELRECVKAQLGQDIPLVINGEKLTKTDTFNSVNPANT 
SQLIAKVSKATQDDIEKAFESANHAYQSWKKWSHKDRAELLLRVAAIIRRRKEEISAIMV 

YEAGKPWDEAVGDAAEGIDFIEYYARSMMELADGKPVLDREGEHNRYFYKPIGTGVriPP 
WNFPFAiriAGTTLAPVVAGNTVLLKPAEDTVLTAYKLMEILEEAGLPQGVVNFVP<.-DPKE 
IGDYLVDHKDTHFVTFTGSRATGTRIYERSAVVQEGQQFLKRVIAEMGGKDAIVVDNNVD 
20 TDLAAEAIVTSAFGFSGQKCSACSRVIVHQDVHDEILEKAIQLTQKLTLGNTEENTFMGP 
VINQKQFDKIKNYIEIGKKEGKLETGGGTDDSTGYFIEPTIFSGLQSAD'RIMQEEIFGPV 
VGFIKVKDFDEAIEVANDTDYGLTGAVITNHREHWIKAVNEFDVGNLYLNRGCTAAVVGY 
HPFGGFKMSGTDAKTGSPDYLLNFLEQKWSEMF* 

25 Sequence 2683 

Contig_0804_pos_874_15 48, 

is similar to (with p~value 3.0e-76) 

>sp:sp| P39788{END3_BACSU PROBABLE ENDONUCLEASE III <EC 4.2.9 
9.18) (DNA-(APURINIC OR APYRIMIDINIC SITE) LYASE), >gp:gp|L4 

30 7709|BACYPIA_26 Bacillus subtilis (clone YAC15-6B) ypiABF ge 
nes, qcrABC genes, ypjABCDEFGHI genes, birA gene, panBCD gen 
es, dinG gene, ypmB gene, aspB gene, asnS gene, dnaD gene, n 
th gene and ypoC gene, complete cds * s . NID: gll4 622 3. >gp:gp 
jU11289|BSU11289_3 Bacillus subtilis 168 asparaginyl-tRNA sy 

35 nthetase (asnS) and endonuclease III (jooB) genes, partial c 
ds and DnaD protein (dnaD) and (jooC) genes, complete cds. 
NIP: g533096. >gp: gp I Z99115 | BSUB0012_174 Bacillus subtilis c 
omplcte genome (section 12 of 21): from 2195541 to 2409220. 
NID: g2634478. 

40 atggagagaatcctaatgataagtaagaaaaaagcattacaaatgattgacgttatagca 
gatatgtttcctaatgcagaatgcgaattaaaccatagaaatgcattcgatcttacaata 
gctgtattattatcagcacagtgtactgataatctagtcaatcgtgtcactcaatcatta 
tttagaaaatatcgaacacctgaagattatttaaatgtgagtgatgaagaattacaaaat 
gatatacgctctattggattatatcgcaataaagccaaaaatataaaaaaattatgccac 

45 tctttaattgaacaatttaatggtcaaatcccacaaacacataaagaattagagagtcta 
gctggagtggggcgtaaaacagcaaatgttgtaatgagtgtcgcatttggagaaccttct 
ttagctgtcgatactcatgttgagagagtttctaaacgtttgggaattaatcgttggaaa 
gatagtgtaagacaagtagaagatcgattatgtgatattatcccaagagatagatggaat 
aaaagccatcatcaattaatattttttgggagatatcattgtcttgctagaaaacctaaa 

50 tgtgagatatgtccgctgttaaatgattgtagagaaggacaaaaacgacataaagcaaag 
ataaaggaggcgtga 

Sequence 2684 

MERILMISKKKALQMIDVIADMFPNAECELNHRNAFDLTIAVLLSAQCTDNLVNRVTQSL 
55 FRKYRTPEDYLNVSDEELQNDIRSIGLYRNKAKNIKKLCHSLIEQFNGQIPQTHKELESL 
AGVGRKTANWMSVAFGEPSLAVDTHVERVSKRLGINRWKDSVRQVEDRLCDIIPRDRWN 
KSIiHCJ-irFGRYHCLARKPKCEICPLLNDCREGQKRHKAKIKEA* 

Sequence 2685 
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Contig_0804_pos_1553_1885, 

putative peptide of unknown function 

atgattgaaaaacaggatttcaatcatatagaggaccaacttgatcaactagcaagtaat 
aaacaactcaaaacaccagaagctagggaacttttagatagttatttcgatttaattatt 
5 aattattttaaacaaataaataacatagatgaaat teat tttaatcaactcgataca tat 
ccagtagttccaatgaattttgatgaacgctatcattatatggttgcacgtaaacaccat 
tttatgggctatcgtcaaatgaaaacattgaaatcagaattaataaaaatgaatgcatct 
tatctaattagaaagcaacgtcaacaaaaataa 

10 Sequence 2 68 6 

MIEKQDFNHIEDQLDQLASNKQLKTPEARELLDSYFDLIINYFKQINNIDEIHFNQLDTY 
PVVPMNFDERYHYMVARKHHFMGYRQMKTLKSELIKMNASYLIRKQRQQK* 

Sequence 2687 
15 Contig_0804_pos_5054_0, 

is similar to (with p-value l.Oe-16) 

>gp;gp| AF076683 |AF076683_2 Staphylococcus aureus oligopeptid 
e transporter putative substrate binding domain (opp-lA) , ol 
igopeptide transporter putative membrane permease domain (op 
20 p-lB) , oligopeptide transporter putative membrane permease d 
omain (opp-lC) , oligopeptide transporter putative ATPase dom 
ain (opp-lD) , and oligopeptide transporter putative ATPase d 
omain (opp-lF) genes, complete cds; and unknown gene. NID: g 
3800817. 

25 atgctcaaacgtacaattaaattcatactttatttaatcgtaagttcgtttattatcttc 
attttagttgagaagacatctggtaatccagcgattctgtatctacaacgtcatggttat 
acgtcgattacgcaagacaatattgaagcggcacaacatcaacttggcttaggacaacat 
gtgttactaagatatatcgattgggttggacatgcactcacgggcaacttaggata;.:ggc 
tt\;a j tcic;gaacgaagcagttaccgctatgataatggaagccatcgtgccgacgc':cgtg 

30 ctaatcattgtctctagttgtatcatgttgccatttggctatattgttggttacttcgtt 
gggacgcgtccgcatacacgttacgctaatggaattcgtggattcgcccaagtgatgacc 
tcaatgccagaatactggttagctattttattcatttattatttaggcgtacgttggcaa 
ttgttaccatttgtaggtagt gat teat ggcaacactttgtgctgccaatcttcacaatt 
gttgttatagaagggtgtcatatcttattgatgacagcacatctgattacacaaacgtta 

35 gatcaagatgcgtatcaactggcgcagttaagacatttttcgttaaaagcgcgtatcatc 
gtacaaattaaagagatatttgcaccac 

Sequence 2688 

MLKRTIKFILYLIVSSFIIFILVEKTSGNPAILYLQRHGYTSITQDNIEAAQHQLGLGQH 
40 VLLRYIDWVGHALTGNLGYGFSTNEAVTAMIMEAIVPTLVLIIVSSCIMLPFGYIVGYFV 
GTRPHTRYANGIRGFAQVMTSMPEYWLAILFIYYLGVRWQLLPFVGSDSWQHFVLPIFTI 
VVIEGCHILLMTAHLITQTLDQDAYQLAQLRHFSLKARIIVQIKEIFAPX 

Sequence 2689 
45 Contig_0804_jpos_3696_3097, 

is similar to (with p-value 5.0e-33) 

>gp:gp|AF068901 |AF068901_4 Streptococcus pneumoniae penicill 
in-binding protein 2b (pbp2b) , RecM (recM) , D-Ala-D-Ala .liga 
se (c^.dl; , D-Ala-D-Ala adding enzyme (murF) , MutT (mutT) .. eel 
50 1 division protein FtsA (ftsA), cell division protein FtsZ ( 
ftsZ), YlmE (ylmE), YlraF (ylmF), YlmG (ylmG) , YlmH (ylmH) , c 
ell division protein DivIVA (divIVA) , and isoleucine-tRNA sy 
nthetase (ileS) genes, complete cds; and unknown gene. NID: 
g4009462. 

55 atgataaacgtaacattagagcaaattaaaaactggatagattgtgaaattgatgaaaaa 
catttaaaaaaaacaataaatggcgtttcaattgattcacgaaaaatcaatgaaggggcg 
ttatttataccttttaaaggtgagaatgttgatggccatcgttttatcacacaagctttg 
aacgatggtgctggagctgtttttagtgaaaaagagaataaacattctgaagggaaccaa 
ggtcctattatttgggtaga.agatactttaatagccttacagcaattggcaaaagcatat 
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ct ba.?. t r:/i t gt aaa tcctaaggtgatagcggtt act ggttctaatggaaaaacaa: a aca 
aaagacatgattgaaagtgtattatcaactgaatttaaagttaagaaaacacaaggaaat 
tataataatgaaattggaatgccgttgactttactagaacttgatgaagacacagaaatt 
tctattctagaaatggggatgtcaggttttcatcaaatagagttgttatctcatatcgca 
5 caacctgatatagcggtcatcacaaacattggcgaatcacatatgcaagatttaaaataa 



Sequence 2690 

MINVTLEQIKNWIDCEIDEKHLKKTINGVSIDSRKINEGALFIPFKGENVDGHRFITQAL 
10 NDGAGAVFSEKENKHSEGNQGPIIWVEDTLIALQQLAKAYLNHVNPKVIAVTGSNGKTTT 
KDMIESVLSTEFKVKKTQGNYNNEIGMPLTLLELDEDTEISILEMGMSGFHQIELLSHIA 
QPDIAVITNIGESHMQDLK* 

Sequence 2691 
15 Contig_0804_pos_264 7_2078, 

is similar to (with p-value 3.0e-57) 

>gp:gp| Y17795I SAU17795_2 Staphylococcus aureus prfA, pbp2 ge 
nes. NID: g3955029. 

atggtaggttgtcgcattagatatgaatactatcgttcggcttatggacatggtgtgtca 
20 ggr'gtaaacatgggtgcgaaaactggtactggtacgtatggacaagaaatatacgf.aaag 
tataatttacctgataatgctgccaaagatgtttggattaatggtttcagtccagaatat 
actatgtcggtatggatgggcttcaataaggttaaacaatatggaacaaattcatttatc 
ggtcattcagaacaagattatccacaatacttgtatgaagatgtgatgtctagtatctca 
tctaaagatggtgaagatttcaaaaagcctaatgatgtacaaggaagttcaccggacagt 
25 ctatctgtatcaggtcattctgataataatactactaaccgtagtgttcatggaagtagc 
gatacatcttcttcatcaaatggtggctctaactcagcatcaagtggaaacaactcgaat 
agttcgaatggtaccagtcaaggtaactcaggcaatgcatttacacgtctgttcaattta 
aactctatattcgattataaagtttcataa 

30 Sequence 2692 

MVGCRIRYEYYRSAYGHGVSGVNMGAKTGTGTYGQEIYEKYNLPDN/UIKDVWINGFSPEY 
TMSVWMGFNKVKQYGTNSFIGHSEQDYPQYLYEDVMSSISSKDGEDFKKPNDVQGSSPDS 
LSVSGHSDNNTTNRSVHGSSDTSSSSNGGSNSASSGNNSNSSNGTSQGNSGNAFTRLFNL 
NSIFDYKVS* 

35 

Sequence 2693 

Contig_0804_pos_913_4 73, 

putative peptide of unknown function 

atgcttttttcttacttatcattaggattctctccattcaaccagtcaaatttttcagta 
40 get ^att tttacaaagatgttcacatgactgtttttcaatggatattctctgcattaact 
ttatggattggggttagtttatttttgactttaggattgattatcgctcaactcaacgat 
attcaaaaagcaagtagttttgccaatttacttaatattacactagctatattaggaggt 
ctatggtttccagtatacacgtttcctgattggcttcagtcgatttctaaacacatgcca 
acatataatttaaagctacttgctatagatttagcgcaaaataaaggggtgaatatagaa 
45 gcgtttggctatctcgtggtctattgtataatctttgtgagtattgctttattcatgaat 
aagaaaggagatgtacactaa 

Sequence 2 694 

MLFSYLSLGFSPFNQSNFSVAHFYKDVHMTVFQWIFSALTLWIGVSLFLTLGLIIAQLND 
50 IQKASSFANLLNITLAILGGLWFPVYTFPDWLQSISKHMPTYNLKLLAIDLAQNKGVNIE 
AFGYLVVYCIIFVSIALFMNKKGDVH* 

Sequence 2695 
Contig_0804_pos_0_4 52 , 
55 putative peptide of unknown function 

gtgagaataggggaattaagtactttaatatatttaatatttccgattttagctatattt 
gttgacaaacgtggtaatttcttaacttatttaattgtttgtactatttttataatcagc 
tatgtgacgatgattatattttataaataccttagtgatagtattttatattcattactg 
gttattcattatttaggaatcttttattttgtctatagtgtcaatcctatgaatagr.ttg 
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tttttcttctatagtgctttcgccttaccttttatttttaatgttcgtgttgtatcaaaa 
gaatttataacctttttaatagctatgataagttgtttaatactaacttatatatttaat 
ccaacatttgtggttccattaagtgcat tttatttggttatattaattgttgctgtaggt 
aattttaaaaatagagacgaacgaattatGAC 

5 

Sequence 2696 

VRIGELSTLIYLIFPILAIFVDKRGNFLTYLIVCTIFIISYVTMIIFYKYLSDSILYSLL 
VIHYLGIFYFVYSVNPMNSLFFFYSAFALPFIFNVRVVSKEFITFLIAMISCLILTYIFN 
PTFVVPLSAFYLVILIVAVGNFKNRDERIMT 

10 

Sequence 2697 

Contig_0805_pos_575_1141, 

putative peptide of unknown function 

atgaaaggtttaattattatagggagtgctcaagtagggtctcatacgaacgctttatca 
15 aaatatttaaaaggtcaactcggcgaacatgatgttgaggtggaaatctttgacctagct 
gagaaacccattcatcaattggattttgctggtacaacacaagcagttgatgaaattaaa 
aacaatgtcaaatctttacaaaataaagcaatggaagcagatttcttaattt taggaacg 
ccaaattatcatggatcgttttcaggtattct taaaaatgcacttgaccaccttaatatg 
gaccatttcaaaatgaaacccgtgggactcatttgcaatagtggaggaatagtaagttct 
20 gagccattatcacacttgagagtcatcgtacgtagtttacttggtattgctgtaccaacg 
caaattgctacacatgattctgattatgctaaattagaagatggtaccttatacttagaa 
gataatgaatttcaactacgttcaaaattgtttgttgatcaaattgtatccttcgtaaca 
aatagtccatatgaacacttaaaataa 

25 Sequence 2698 

MKGLIIIGSAQVGSHTNALSKYLKGQLGEHDVEVEIFDLAEKPIHQLDFAGTTQAVDEIK 
NNVKSLQNKAMEADFLILGTPNYHGSFSGILKNALDHLNMDHFKMKPVGLICNSGGIVSS 
EPLSHLRVIVRSLLGIAVPTQIATHDSDYAKLEDGTLYLEDNEFQLRSKLFVDQIVSFVT 
NSPYEHLK* 

30 

Sequence 2699 

Cent j g h 8 0 5_pos_l 5 8 5_2 4 21, 

is similar to (with p-value l.Oe-95) 

>gp:gp| Y17116|SEY17116_1 Staphylococcus epidermidis gene enc 
35 oding f ibrinogen-binding protein, complete CDS. NID: g320154 
9. 

atgattaataaaaaaaataatttactaactaaaaagaaacctatagcaaataaatccaat 
aaatatgcaattagaaaattcacagtaggtacagcatctattgtaataggtgcaacatta 
ttgtttggtttaggtcataatgaggccaaagctgaggagaattcagtacaagactttaaa 

40 gattcgaatacggatgatgaattatcagatagtaatgatcagtctagtgatgaagaagag 
aatgatgtaattaataataatcagtcaataaactctgatgataataaccaaataaataaa 
aaagaagaaacgaataacaacgatggtatagaaaaaagctcagaagatagaacagagtca 
acaacaaatgtagatgaaaacgaagcaacatttttacaaaagtcccctcaagataatact 
catcttacagaagaagaggtaaaagaaccctcatcagtcgaatcctcaaattcatcaatt 

45 gatactgcccaacaaccatctcacacaacaataaatagagaagaatctgttcaaacaagt 
gataatgtagaagattcacacgtatcagattttgctaactctaaaataaaagagagtaac 
actgaatctggtaaagaagagaatactatagagcaacctaataaagtaaaagaagattca 
acaacaagtcagccgtctggctatacaaatatagatgaaaaaatttcaaatcaagatgag 
ttattaaatttaccaataaatgaatataaagtaacgaaacttagcgtcactttcttcatt 

50 gaaaagaaccgtgaaatacttttgactttcatatcaattctccttatgaattattaa 

Sequence 2700 

MINKKNNLLTKKKPIANKSNKYAIRKFTVGTASIVIGATLLFGLGHNEAECAEENSVQDFK 
DSNTDDELSDSNDQSSDEEENDVINNNQSINSDDNNQINKKEETNNNDGIEKSSEDRTES 
55 TTNVDENEATFLQKSPQDNTHLTEEEVKEPSSVESSNSSIDTAQQPSHTTINREESVQTS 
DNVEDSHVSDFANSKIKESNTESGKEENTIEQPNKVKEDSTTSQPSGYTNIDEKISNQDE 
LLNLPINEYKVTKLSVTFFIEKNREILLTFISILLMNY* 



Sequence 2701 
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Con t ig_0 8 0 5_pos_4 3 9 8_4 9 1 9 , 

putative peptide of unknown function 

atactcattgatatagttgttcttcttattatttgttactttatagtgatagggtl-vcgt 
agagotatttggttatcgatattgcactttgcttcttcaattgtatctttatata • tgcg 
5 tcacaacattatcaatctattgcgcaacgtttagttgtctttgtgccatttccgaaaacg 
gtggcgt ttgacatggtctatactattccttatgatcatttgcaatacagatttgaaaaa 
gtgatagcatttattataatatttggtatgtgtaagcttattttgtatctagttgttgtt 
acatttgataatataataacgtataaaaagatacatttagtaagtcggatatcgagtgtc 
gttttgagtatcatagcggtttttatatatttacaaattggactttatttattatcgcta 
10 tatccgcattcatttatacagtaccaattatctcaatcgctattaagtcgagttgtgatt 
gaacaaattccttatttatcacaatttattttaaatttataa 

Sequence 2702 

MLIDIVVLLIICYFIVIGFRRGIWLSILHFASSrvSLYIASQHYQSIAQRLVVFVPFPKT 
15 VAFDMVYTIPyDHLQyRFEKVIAFIIIFGMCKLILYLWVTFDNIITYKKIHLVSRISSV 
VLSIIAVFIYLQIGLYLLSLYPHSFIQYQLSQSLLSRWIEQIPYLSQFILNL* 

Sequence 2703 
Contig_0805_pos_4 994_54 61, 

20 putative peptide of unknown function 

atgacaaaaaaagatgtaattcaattattagaaaaaatagctatatatatggagctaaaa 
ggagaaaatacatttaaagtttcagcgtatagaaaagccgcacaaagtctagaggttgat 
gaqcgtacattagaagagattgatgatgtaacagaacttaaaggcattggaaaagqcgta 
ggagaagttattaatgaatttaaaacacaaggtcaatcatcgacccttcaagcacitcaa 

25 gatgaagtacctgaagggttagtgccacttttgaaaatacaaggattaggcataccattg 
ataattattatatttttcaacaattggatctatccacatatccatcgctttacaatctgg 
acaccagtcggcttcaaacttgataattactggatcagtactaccaatagtagaattaaa 
ttcttcggtttgatgaatatttttcatcactatctactccttttgtaa 

30 Sequence 2704 

MTKKDVTQLLEKIAIYMELKGENTFKVSAYRKAAQSLEVDERTLEEIDDVTELKGIGKGV 
GEVINEFKTQGQSSTLQALQDEVPEGLVPLLKIQGLGIPLIIIIFFNNWIYPHIHRFTIW 
TPVGFKLDNYWISTTNSRIKFFGLMNIFHHYLLLL* 

35 Sequence 2705 

Cont i g_0 8 0 5_pos_3 8 7 5_2 94 9, 

is similar to (with p-value 2.0e~34) 

>sp:sp|O07874 |RNH2_STRPN RIBONUCLEASE, HII (EC 3.1.26.4) (RNA 

SE HXI). >gp:gp|U93576|SPU93576_l Streptococcus pneumoniae r 

40 ibonuclease HII (rnhB) gene, complete cds . NID: g2209338. 

atgggaaatgtcgtatacaaactcacgtcaaaagaaattcaatcattgatggctcaaact 
acttttgagacgacgaagttacctcaaggtatgaaagctcgtacgagatatcaaaatact 
gtta':oc.> tatctatagttctggcaaagtaatgtttcaaggtaagaatgctgaacr. actt 
gcgagtcaattgctaccaaataaacaatcaacaactggcaaacatacatcatcaaataca 

45 actagtattcaatataatcgttttcattgtattggaagcgatgaagcaggcagtggcgac 
tattttggtccattgactgtatgtgcagcttatgtgagccaatcacatatcaaaatctta 
aaagaacttggtgtagatgattcaaaaaaactaagcgatactaaaatcgtcgatcttgca 
gaacagctcattacctttatcccgcattctttattaacattagataatgttaagtataac 
gaacgacaaagtctaggatggtctcaagttaaaatgaaagctgtcttacataatgaagct 

50 atcaaaaatgtgcttcaaaaaattgagcaagatcaactggattatattgttattgatcaa 
tttgcaaagcgagaagtttatcaacattatgcattatcagcattaccttttcctgacaaa 
acaaaatttgaaacaaaaggtgaatctaaatcactagcaatcgcggcagcaagcattatt 
tctcgttatgcatttgttaaacacatggaccacatctctaaaaaactccatatggaaata 
ccaaaaggagcaagtaacaaagtagatttaattgccgctaaagtcattcaaaaatatgat 

55 attcaacaacttgatactatttcaaaaaaacattttaaaaacagagataaagcaattcat 
cttatgaatcaaaaatacaataaataa 

Sequence 2706 

MGNVVYKLTSKEIQSLMAQTTFETTKLPQGMECARTRYQNTVINIYSSGKVMFQGKNAEQL 
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ASQLLPl^KQSTTGKHTSSNTTSIQYNRFHCIGSDEAGSGDYFGPLTVCAAYVSQSHIKIL 
KELGVDDSKKLSDTKIVDLAEQLITFIPHSLLTLDNVKYNERQSLGWSQVKMKAVLHNEA 
I KN VLQKI EQDQLDY I VI DQFAKREVYQH YALSALPFPDKT KFETKGES KSLAI AAAS 1 1 
SRYAFVKHMDHISKKLHMEIPKGASNKVDLIAAKVIQKYDIQQLDTISKKHFKNRDKAIH 

5 LMNQKYNK* 

Sequence 2707 

Con t i g_0 8 0 6_po s_2 4 0 8_3 2 8 9 , 

is similar to (with p-value O.Oe+00) 

10 >sp:sp| P37565I YACC_BACSU HYPOTHETICAL 31.8 KD PROTEIN IN FTS 
H-CYSK INTERGENIC REGION. >gp: gp | D26185 I BAC180K_134 B. subti 
lis DNA, 180 kilobase region of replication origin. NID: g46 
7326. >gp:gp| Z99104 |BSUB0001_71 Bacillus subtilis complete g 
enome (section 1 of 21): from 1 to 213080. NID: g2632267. 

15 atga racr,tgattatatagtgagaggtttagcatacggtggggaaataagagcat.?tgct 
gcaatcacaacagagtcagtacaagaagcacaaacacgtcattatacatggcctactgct 
tctgccgctatgggaagaactatgacagctactgttatgatgggtgcaatgttaaaagga 
aaccaaaagttaacagttactgttgatggcaaaggtccaattggcagaattattgctgac 
gcagatgctcaaggaaatgttcgtgcatatgtagaccatccacaaacgcattttccactc 

20 aacgatcaaggtaaattggatgtacggcgagcagttggtactgatggttccattcaggtt 
gttaaagatgttggaatgaaagactacttttctggtgcgagtccaatagtatcaggtgag 
ctaggagatgatttcacatactactatgccacaagtgaacaaacaccatcatcagtagga 
ttgggtgtattagttaatccagacaactcaatcaaagcagcgggaggatttattattcaa 
gttatgccaggtgctactgatgaaacggtgactaaattagaagaagccattagtcaaatg 

25 caacctgtatcgaaattaattgagcaaggacttacacctgaaggaatattaaatgaaatt 
ttgggtgaaggtaatgttcaaattttaaattcaacgtcagcgcaatttgaatgtaattgt 
agtcatgagaaatttttaaatgctattaaaggtttaggagaggcagaaattcatagcatg 
attaaagaggatcatggagctgaagctgtatgtcacttctgtggtaataaatatcagtat 
agtgaaagtgaattagaagatttattagaaacaatgaaatag 

30 

Sequence 2708 

MTHDYIVRGLAYGGEIRAYAAITTESVQEAQTRHYTWPTASAAMGRTMTATVMMGAMLKG 
NQKLTVTVDGKGPIGRIIADADAQGNVRAYVDHPQTHFPLNDQGKLDVRRAVGTDGSIQV 
VKDVGMKDYFSGASPIVSGELGDDFTYYYATSEQTPSSVGLGVLVNPDNSIKAAGGFIIQ 
35 VMPGATDETVTKLEEAISQMQPVSKLIEQGLTPEGILNEILGEGNVQILNSTSAQ./ECNC 
SHEKFLNAIKGLGEAEIHSMIKEDHGAEAVCHFCGNKYQYSESELEDLLETMK* 

Sequence 2709 
Contig_0806_pos_3617_4 558, 
40 is similar to (with p-value O.Oe+00) 

>sp:sp! P37887 ICYSK_BACSU CYSTEINE SYNTHASE (EC 4.2.99.8) (0- 
ACETYLSERINE SULFHYDRYLASE) (0-ACETYLSERINE (THIOL) -LYASE) ( 
CSASE) . 

gtgttttggatggcacaaaaacctgtagattatgttacacaaattattgggaatacacct 
45 gtagtcaaattaagaaacgttgttgatgatgatgcagctgatatttatgttaagttagaa 
tatcaaaatccaggtggttcggtaaaagatcgtatcgctttagcgatgattgaaaaagct 
gagcgtgaagggaaaattaaacctggtgatacaatcgttgagcctacgagtggtaacact 
ggtataggtctagcatttgtatgtgctgccaaggggtacaaagcagtttttacaatgcct 
gaaacaatgagccaagagcgccgtaacttattaaaagcttatggtgctgaactagtatta 
50 acaccaggatctgaagctatgaaaggtgcaataaaaaaagctaaagaattaaaagaagag 
cacggctattttgaaccacaacaattcgaaaatccagcaaatcctgaaattcatgaactt 
acaactggaccagaattagttgaacaatttgaaggtcgacaaattgatgcatttttagct 
ggtgtaggaactggtggtacgttatctggtgttggtaaagtattgaagaaagaatatcca 
aatgtgoaaatagtagctattgaacctgaagcttctccagtattaagcggtggtgaacca 
55 ggccctcataaattacaaggattgggagcaggtttcgtacctgatactttaaatacagaa 
gtttatgacagcatcatcaaagtaggtaatgatactgctatggatatggcacgtcgtgtt 
gctagagaagaaggtatattagcaggtatttcatctggtgctgcaatatatgctgctatt 
caaaaagcaaaagaattaggtaaaggtaaaacagttgtaacagtattaccaagtaatggg 
gaacgttacttatcaacaccattatattcatttgataattaa 
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Sequence 2710 

VFW^4AQKPVDYVTQIIGNTPVVKLRNVVDDDAADIYVKLEYQNPGGSVKDRIALAMIEKA 
EREGKIKPGDTIVEPTSGNTGIGLAFVCAAKGYKAVFTMPETMSQERRNLLKAYGAELVL 
5 TPGSEAMKGAIKKAKELKEEHGYFEPQQFENPANPEIHELTTGPELVEQFEGRQIDAFLA 
GVGTGGTLSGVGKVLKKEYPNVEIVAIEPEASPVLSGGEPGPHKLQGLGAGFVPDTLNTE 
VYDSilKVGNDTAMDMARRVAREEGILAGISSGAAIYAAIQKAKELGKGKTWTVLPSNG 
ERYLSTPLYSFDN* 

10 Sequence 2711 

Contig_0806_pos_5681_6160, 

putative peptide of unknown function 

gtgaaaacttataacaataatcaatctgtgggtaacttcttaagtgtttcagtacaagat 
ggtcaaacagttaaacaaggtgaacgtatcatcaattatgatacaaatgggaataaacgc 

15 caacaactattgaacaaagtgaatcaagcacaatctcaagttaatgatgattatcaaaaa 
gtaaatcaaagtcctaacaatcatcaattacaagttaaattgactcaagatcaaagtgct 
ttaaatgaagctcagcagtcattgtcacaatatgacagacaactcaatgacagcatgaat 
gcatcatttgatggtaaaattaacattaaaaatgattcagatgtaggcgaagggcaacct 
attttgcaattaatttcttcaaatcctcaaattaacgcaactatcacagagtttgatatt 

20 aataaaattaaagaaggcgatgaagtaaatgtcactgtaaatagcaaatgtcccatataa 



Sequence 2712 

VKTYNNNQSVGNFLSVSVQDGQTVKQGERIINYDTNGNKRQQLLNKVNQAQSQVNDDYQK 
25 VNQSPNNHQLQVKLTQDQSALNEAQQSLSQYDRQLNDSMNASFDGKINIKNDSDVGEGQP 
ILQLT.SSNPQINATITEFDINKIKEGDEVNVTVNSKCPI* 

Sequence 2713 
Contig_0806_pos_6623_6060, 

30 putative peptide of unknown function 

atgagaaaaggaaatcagaatgaagctttagaagaatttatcggaactttattaaaagat 
gagcaatattattatgagttagcatttttagaaagtgaaacacaaaatcttgaaatcata 
atggagaagatgattaagcaaggaattacaaaatttcgtattgtacctttactcattttt 
agtgcaatgcattatatcagtgatattccacaaatacttaaagagatgaaagctcgatat 

35 ccacaaattgatagtaaaatgagtgcgcctcttggtacacatccatatatgaaaacatta 
gtagaaaatagaattgctgatgaaaaagtcagtgaaggttcaaccaaagcaactatagta 
attgcccatggaaatggaagtggacgttttacgaaagcacatgatgaattaaaagcattt 
gttaaaacgcttgatagtcatca tcctgtttatgcaagagctttatatgggacatttgct 
atttacagtgacatttacttcatcgccttctttaattttattaatatcaaactctgtgat 

40 agttgcgttaatttgaggatttga 

Sequence 2714 

MRKGNQNEALEEFIGTLLKDEQYYYELAFLESETQNLEIIMEKMIKQGITKFRIVPLLIF 
SAMHYISDIPQILKEMKARYPQIDSKMSAPLGTHPYMKTLVENRIADEKVSEGSTKATIV 
45 lAHGNGSGRFTKAHDELKAFVKTLDSHHPVYARALYGTFAIYSDIYFIAFFNFINIKLCD 
SCVNLRI* 

Sequence 2715 
Contig_0806_pos_5264_4 722, 

50 putative peptide of unknown function 

atgagtacacaggaattaaaagaaaatgatattcaaaactcaagtattccatttgttaat 
cattttaacaaactaagacatcagccaaaatggatacttaaatcaatcatagtgatagta 
ttagctattattagcgctttcatcacatataatactagtaatgaactattagataaccaa 
tcgatagctaatagtcaaatggatgaaaatatgtttcgtatgtctacaactataggcgct 

55 ttcataggaacaatctttagtgttgttgtagtatttttaatctttctaattatttctaaa 
atatttaaatctgacgccaaagcgagcagtttattctcagcagcactctcttattcaatc 
attattttaggatttacaactattatttctttaattcaaatagtttttggactaaagata 
acagactataagcttgatagcttaaacattttttcaaaggataataaaacactcatggac 
atctgctacttcatgatatcattcatttcaataagttttgagtatatattatcggcatat 
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taa 

Sequence 2716 

MSTQELKENDIQNSSIPFVNHFNKLRHQPKWILKSIIVIVLAIISAFITYNTSNELLDNQ 

5 SIANSQMDENMFRMSTTIGAFIGTIFSWVVFLIFLIISKIFKSDAKASSLFSAALSYSI 

IILGFTTIISLIQIVFGLKITDYKLDSLNIFSKDNKTLMDICYFMISFISISFEYILSAy 
* 

Sequence 2717 
1 0 Con t i g_0 8 0 6_po s_8 33_2 6 7 , 

is similar to (with p-value 3.0e-54) 

>sp:sp| P37 47 6tFTSH_BACSU CELL DIVISION PROTEIN FTSH HOMOLOG 
(EC 3.4.24.-). >gp:gp|D26185|BAC180K_132 subtilis DNA, 18 
0 kilobase region of replication origin. NID: g4 67326. >gp:g 

15 p|Z99104 |BSUB0001_69 Bacillus subtilis complete genome (sect 
ion 1 of 21): from 1 to 213080. NID: g2632267. 
atQatacatggggcatttttcttagcattttcaaataaatcacgaacacgactcgcacca 
acaccaacaaacatctcaacgaagtcagatccactaattgagaagaatggtgcaccagct 
tcacctgcaaccgcacgtgctaataatgttttacctgtacctggaggcccaacaagtaag 

20 acaccttttggaattcttgaacccatttgtttaaatttcttgttatcttttaagaaatct 
acaatttctattaattcttgtttctcttcgtcagctcctgctacatctgagaaacgaact 
cgacgtttattactgtcgtacatcttagctttggattttccaaagttcatcatacgacca 
ccaccgccgccaccttgggcttggctaaggaagaaaataaataataatgcaatgattaat 
acaggaatcagtgtcgttaaaatactaacgaatacactttgtttttcttcttctttaact 

25 gtaaatttaagattatcttgtttcttagctgtatctgtgattttttgtaaatctttttca 
ttgttatataaaattgtcgatgagtaa 

Sequence 2718 

MIHGAFFLAFSNKSRTRLAPTPTNISTKSDPLIEKNGAPASPATARANNVLPVPGGPTSK 
30 TPFGILEPICLNFLLSFECKSTISINSCFSSSAPATSEKRTRRLLLSYILALDFPKFIIRP 
PPPPPWAWLRKKINNNAMINTGISVVKILTNTLCFSSSLTVNLRLSCFLAVSVIFCKSFS 
LLYKIVDE* 

Sequence 2719 
35 Contig_0808_pos_5584_4 772, 

is similar to (with p-value 1.0e~62) 

>gr':qp| AJ223960|LLCAJ3960_4 Lactococcus lactis cremoris MG13 
63-ir-.v>. chromosomal inversion junction DNA. NID; g36472r4. 
atgaaagatcaattgagaggtaatgtcgtgattatacatcaacctgcagaagaaatgcca 

40 ccgggcggtgcgaaaggtatgatagatgacggcgtattagaaggggtggaccatgtttta 
ggtgcacatgtgatgagtatgatggaaacaggtaaaatatactatcgtgaaggttttgtt 
caaacgggacgcgcttattttagacttgttgtgaaaggtcagggtggtcacggctcatcc 
ccacatacatcga_atgacgctattgtagcaggtgcgcattttgtaacgaccgcacagacc 
attgtttcaagacgcttaaatccgtttgaaacgggtgtagtcacaataggttccttcgac 

45 gggaaaggacaattcaatgtgattaaagatacaataacaatcgaaggtgatgtacgtgca 
ctgactgatgatacaagagacaatattcagaatgaaatgacacgactagtcagaggatta 
gaagagatgttcggagtgatttgtgattttgaatttaaaaaggattatccagctctttac 
aatgatcctgaatttacttcttatgtagcaacaactttaaaaaatgctaagctggatgac 
ataaaagcaatagatatttgtgagccgcagccaccctcagaagatttcgcgttttatgcg 

50 ttagaaagaccttcaacatttatttattcgggtgcagcaccagaagatggacctatgtac 
cctcaccatcatcctaagtttaatatcaatgaaacatctatgcttgtagttgcagaggca 
gtgggtacaattgtattagattatttgaaataa 

Sequence 2720 

55 MKDQLRGNWIIHQPAEEMPPGGAKGMIDDGVLEGVDHVLGAHVMSMMETGKIYYREGFV 
QTGRAYFRLVVKGQGGHGSSPHTSNDAIVAGAHFVTTAQTIVSRRLNPFETGWTIGSFD 
GKC.Q'^NVrKDTITIEGDVRALTDDTRDNIQNEMTRLVRGLEEMFGVICDFEFKKDrVALY 
NDPEFTSYVATTLKNAKLDDIECAIDICEPQPPSEDFAFYALERPSTFIYSGAAPEOGPMY 
PHHHPKFNINETSMLVVAEAVGTIVLDYLK* 



665 



wo 01/34809 



PCT/USOO/30782 



Sequence 2721 

Contig_0808_pos_4 094_3633, 
is similar to (with p-value 3.0e-18) 
5 >sp:sp|P3804 9| YIXC_BACSU HYPOTHETICAL 18.8 KD PROTEIN IN PBP 
F 5'REGION. >pir:pir|B40614 1B40614 hypothetical protein X (p 
bpF 5' region) - Bacillus subtilis >gp;gp|L10630 |BACPBPF_2 B 
aciilus subtilis penicillin-binding protein (pbpF) gene, 5' 
end. Nin: g304158. >gp : gp I Z99109 I BSUB0006_86 Bacillus s\ibtil 
10 is complete genome {section 6 of 21) : from 999501 to 1209940 
. NID: g2633260. >gp: gp | Y14083 | BSY14083_4 Bacillus subtilis 
chromosomal DNA, region 76-78 degrees; between glyB-aprE, NI 
D: g2226224, 

atgaagggatgtcttatgttaattagcgatacaaataaaacatatataatagaagaaact 
15 agtaattcattcacaattgaaaagaataatgatcagcagcactacgaagtattagaatca 
atcaacagcttatctaatgattcattttgtgtgttaaatcacttattcgtcaatggaggt 
aatgaagaggtttttgagtcacggtttttaaagcgaaatcaacatttgcaagatgtgcct 
ggttttaaagcgttaagatttcttagaccggtagtcaaagggagacattacattatcatc 
acgctatggaacagtagacaagctttctatgattggcaaaattcacaagcatatgcgcaa 
20 actcataaaaaacgtggaactcaaaaaggtgttgatcatcgtatagtcaatagagattta 
tcctataatataagaatagagttagaaagtcttaataactaa 

Sequence 2722 

MKGCLMLISDTNKTYIIEETSNSFTIEKNNDQQHYEVLESINSLSNDSFCVLNHLFVNGG 
25 NEEVFESRFLKRNQHLQDVPGFKALRFLRPWKGRHYIIITLWNSRQAFYDWQNSQAYAQ 
THKKRGTQKGVDHRIVNRDLSYNIRIELESLNN* 

Sequence 2723 

Coiit ;.g_0808_pos_198 6_1399, 

30 is similar to (with p-value 4.0e-51) 

>sp:sp| P37470 |SP5C_BACSU PROBABLE PEPTIDYL-TRNA HYDROLASE (E 
C 3.1.1.29) (PTH) (STAGE V SPORULATION PROTEIN C). >gp:gp|D2 
6185 jBAC180K_116 B. subtilis DNA, 180 kilobase region of rep 
lication origin. NID: g467326. >gp: gp ! Z99104 I BSUB0001_53 Bac 

35 illus subtilis complete genome (section 1 of 21) : from 1 to 
213080. NID: g2632267. 

gtggaggtaacaataatgaaatgcattgtcggtct tggcaacattggtaaacgttttgaa 
ttaacaagacataatattggtttcgaagttgtcgatgatattctagaacgccaccaattt 
actttagacaaacaaaaatttaaaggtgcatatactattgaacgtttaaacggcgaaaaa 

40 gtattatttattgagccaatgaccatgatgaacttatctggtcaagctgtagccccttta 
atggattattataatgtcgatgttgaagatttgatcgttttatatgacgatttagattta 
gaacaaggacaagtgcgtctgcgccaaaaggggagtgcaggcggtcataatggtatgaaa 
tcgataattaaaatgcttggtacagatcaatttaaacgtattcgaattggtgttggccgt 
ccaacaaatgggatgtctgttccggactatgttttacaaaaattttcaaaagaagaaatg 

45 atcattatggaaaaggtaattgaacattctgcaagagctgtagaatcttttattgaaagt 
tctcgttttgatcatgttatgaatgaatttaatggtgaagtcaagtga 

Sequence 2724 

VEVTJMPvCIVGLGNIGKRFELTRHNIGFEWDDILERHQFTLDKQKFKGAYTIERVMGEK 
50 VLFIEPMTMMNLSGQAVAPLMDYYNVDVEDLIVLYDDLDLEQGQVRLRQKGSAGGHNGMK 
SIIKMLGTDQFKRIRIGVGRPTNGMSVPDYVLQKFSKEEMIIMEKVIEHSARAVESFIES 
SRFDHVMNEFNGEVK* 

Sequence 2725 
55 Contig_0808_pos_0_l 393 , 

is similar to (with p-value 4.0e-65) 

>gp:gp| AF054624 |AF054624_1 Lactobacillus sakei transcription 
-repair coupling factor (mfd) gene, partial cds; L-lactate d 
ehydrogenase (IdhL) gene, complete cds; and unknown genes. N 
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ID. v;3':'a014. 

atgattgcaaattacattagcgaagataatcgttttcaagaattagatgaagtcttcggc 
caagagaatattttagttacgggattatccccgtcagcgaaggcaacaattattgctgaa 
aaatatttaaaagatcataaacaaatgctactcgtaactaataatttataccaagcagat 
5 aaaatcgaaactgacattttacaatatgtagatgactcagaagtttataaatatcctgtt 
caagatataatgactgaagagttttctactcaaagtccacaattgatgagtgagcgtgtt 
agaacgttgactgcgttagcccaaggcgaaaaagggttatttattgtgcctttaaacggc 
tttaaaaaatggctaacaccggttgatttatggaaagatcatcaaatgacgcttaaagta 
ggtcaggatattgatgttgatgcattcttaaataaattagttaatatgggttatcgccga 

10 gaaagtgtagtgtcacata ttggt gag ttt teat tgcgtggtggaatcatagatata tat 
ccgttgataggaacacctgtgagaatagagctatttgatactgaagttgattccatcaga 
gactttgatgtagaaacacaacgttctaacgataatataaatcaagttgaaatcacaaca 
gctagtgactacattattactgatgaagtgatacaacacttacaaaatgaacttaaaaaa 
gcatatgaatatacacgccctaaaattgaaaagtccgtacgtaatgatttaaaagagaca 

15 tatgaaagttttaagttgtttgaatctacctttttcgatcatcaattgttacgacgtctt 
gtttcatttatgtatgaaaaaccatcaacccttattgactattttcaaaaaaacgcgatt 
attgtagttgatgagtttaatcgtattaaggaaacagaagaaacacttacaacagaagtt 
gaagattttatgagtaacttaattgagagtggaaatggatttatcggacaaggatttatg 
aagtatgaaagttttgacgcattattagagcaacatgccgttgcatatttcacattattt 

20 acf*tcttcgatgcaagtaccattacaacatattattaagttctcttgtaaaccagt*.caa 
caattttatggtcaatatgacattatgcgctcggaatttcaaagatacgtgcatcaagat 
tacactgtcgtagttcttgttgaaactgaaacaaaagttgaacgtattcaatcaatgctt 
aatgaaatgcatattccaacagtatcaaatattcacgaagatattgatggtggtcaagtt 
gtagtgacggaaggtagtctttctgaaggctttgaattaccttatatgcagttggtagtc 

25 atcacagaaagag 

Sequence 2726 

MIANYISEDNRFQELDEVFGQENILVTGLSPSAKATIIAEKYLKDHKQMLLVTNNLYQAD 
KIETDILQYVDDSEVYKYPVQDIMTEEFSTQSPQLMSERVRTLTALAQGEKGLFIVPLNG 
30 FKKWLTPVDLWKDHQMTLKVGQDIDVDAFLNKLVNMGYRRESVVSHIGEFSLRGGIIDIY 
PLIGTPVRIELFDTEVDSIRDFDVETQRSNDNINQVEITTASDYIITDEVIQHLQNELKK 

AYEYTRPKIEKSVRNDLKETYESFKLFESTFFDHQLLRRLVSFMYEKPSTLIDYFQKNAI 
IVVDEFNRIKETEETLTTEVEDFMSNLIESGNGFIGQGFMKYESFDALLEQHAVAYFTLF 
TSSMQVPLQHIIKFSCKPVQQFYGQYDIMRSEFQRYVHQDYTVVVLVETETKVERIQSML 
35 NEMHIPTVSNIHEDIDGGQVVVTEGSLSEGFELPYMQLVVITERX 

Sequence 2727 

Cont ig_08 1 Oposl 8 2_5 92 , 

putative peptide of unknown function 

40 atci g'aa:^acgaatgcgcaaaactgaaagcttggttatcaataatcctgtgttaci. jaat 
gaacatgaagacgaagcagatatactgtatataggatttatatctactaaaggtgctatt 
ggagaaggtgcagaaagactagaacgacatggtgtaaaagtgaatacgatgcatattcga 
caattacatcctttccctaaagatattgttcaacaagctattaataaagcttcgaaagta 
atagttgcagaacataattatcaaggacaattatcaagtattttaaaaatgaacacacaa 

45 gttaatgataaattagttaatcaaacaaaatacgatgggaaacctttcttaccttatgaa 
attgaagaaaaaggtttggaaattgctaaagagttaaaggagttggtgtaa 

Sequence 2728 

MEKRMRKTESLVINNPVLLNEHEDEADILYIGFISTKGAIGEGAERLERHGVKVNTMHIR 
50 QLHPFPKDIVQQAINKASKVIVAEHNYQGQLSSILKMNTQVNDKLVNQTKYDGKPFLPYE 
lEEKGLEIAKELKELV* 

Sequence 2729 
Con t i g_0 8 1 0_pos_2 0 4 5_3 106, 
55 putative peptide of unknown function 

atggacaaatttaaatctatgacagaattaaaagaattgactaaagaaggaaaagattgg 
gaaatagagtgtgaaaatcgttctagcatagtcactatattagcattacatggcggtgga 
attgaacctgccacaactgaattagcctatacaattgcacattgtggcgactataactat 
ttttcctttaaaggtatgagaagtaaggggaataatgagttacatgtgacttccacecat 
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tatgatgaccaaattgcattagatttagtgagaggtagccaaagaactgtagccatccat 
ggttgtgaaggtaatgaaagtgtggcttatataggaggtagtgatgacagactaattgag 
ttaatcaccgaatctcttgaagatataggaattagcgtgcgagaagcaccacatcatatt 
tctggaactcaagaaaatactgaaaatggagcagtcatctataaaggtaatcaactttac 
5 aattatcgaagttttgatcagtatatttttcaaaaagttgtaaattatttaaatttgaat 
caaaagataaacaatttgattatttgtggtgtaaaaagtgcatatattttaaaagaaaca 
agcgaagcatttaagcaagatgcacgtacatattatcaccaactaatagaggttgactcc 
ttacaaacattacctgatgatgattatgtgaaaattgctttcaatataaatcgtcagact 
catccagacttagatgagaaattagctcttaagtttaaagacgatattaaactagtatca 
Id agtgggagagatagtatagatgttattatgccaaatatgactaagggtcaagctttgtct 
agattattaaaagaatggcaaatgcctgcttcacatttaatggcatttggagatgcaaat 
aacgataaagatatgttggagcttgccgaacatagttatgttatggctaatagtgaagat 
caatcattatttaatatagcgagtcatgtggcaccttccaatgatgaacaaggcgtacta 
tcaacaatcgaaaatgttgttctcggttattccaataaataa 

15 

Sequence 2730 

MDKFKSMTELKELTKEGKDWEIECENRSSIVTILALHGGGIEPATTELAYTIAHCGDYNY 
FSFKGMRSKGNNELHVTSTHYDDQIALDLVRGSQRTVAIHGCEGNESVAYIGGSDDRLIE 
LITESLEDIGISVREAPHHISGTQENTENGAVIYKGNQLYNYRSFDQYIFOKVVNYLNLN 
20 QKINNLIICGVKSAYILKETSEAFKQDARTYYHQLIEVDSLQTLPDDDYVKIAFNINRQT 
HPDLDEKLALKFKDDIKLVSSGRDSIDVIMPNMTKGQALSRLLKEWQMPASHLMAFGDAN 
NDKDMLELAEHSYVMANSEDQSLFNIASHVAPSNDEQGVLSTIENVVLGYSNK* 

Sequence 2731 
25 Contig_0810_pos_4526_5194, 

is similar to (with p-value 2.0e-20) 

>gp:gp|AF012552 I AF012552_2 Helicobacter pylori prolipoprotei 
n diacylglycerol transferase (Igt) and NADPH-linked flavin n 
itroreductase (rdxA) genes, complete cds* NID: g25644 40. 

30 atgattatgaatcagatgaatcaaacgattattgatgcattccattttagacatgcgaca 
aaagaatttgaccctacgaaaaaaattagtgatgaagattttaatacgattttagaaaca 
ggtaaati.atctccaagttcactaggtt tagaaccttggcactttgtagtggttc.'aaat 
aaagaattgagagaaaaattgaaagcctatagttggggagcacaaaagcaacttgataca 
gcaagtcactttgtattaatt t ttgctcgtaagaatgtgacggctcatacagattacgtg 

35 caacatttacttcgtggcgtcaaaaaatatgaagaaagtacaattccagcagttgaaaat 
aaatttgatgatttccaagaaagtttccatattgccgataatgaacgaacattatatgac 
tgggcgagtaaacaaacatatattgcattagcaaacatgatgacaagtgctgcattacta 
ggtatcgactcatgtccaattgaaggatttgatttagataaagtgactgaaattctttca 
gatgagggtgttttagatacggaacaatttggtatttcagttatggtaggctttggttac 

40 agagcacaagaacctaaacatggcaaagttagacaaaacgaagacgacatcattagttgg 
attgaataa 

Sequence 2732 

MIMNQMNQTIIDAFHFRHATKEFDPTKKISDEDFNTILETGRLSPSSLGLEPWHFVVVQN 
45 KELREKLKAYSWGAQKQLDTASHFVLI FARKNVTAHTDYVQHLLRGVKKYEESTI PAVEN 
KFDDFQESFHIADNERTLYDWASKQTYIALANMMTSTU^LLGIDSCPIEGFDLDKVTEILS 
DEGVLDTEQFGISVMVGFGYRAQEPKHGKVRQNEDDIISWIE* 

Sequence 2733 
50 Contig_0810_pos_4292_3300, 

is similar to (with p-value O.Oe+00) 

>gp:c;pi U31175 I SAU31175_1 Staphylococcus aureus D-specif.c D- 
2-hydroxyacid dehydrogenase (ddh) gene, complete cds. NID: g 
1644432. 

55 atgacaaaaattatgtttttcggcacaagagcatatgagaaggacatggcattacgttgg 
ggaaagaaaaataatatcgatgtcactacatcaacagaacttttaagtgtagatactgtc 
gatcaattaaaagattatgacggtgttacaacaatgcagttcggtaaattagaacctgaa 
gtttaccctaaattagagtcctatggtattaaacaaattgcacaacgtacggctggattt 
gatatgtatgacttagaacttgcaaaaaaacatgaaattattatctcgaatatacctagt 
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tattcacctgaaacaattgctgaatattcggtatctatcgctctgcaactcgtacgaaaa 
ttcccaacaattgaaaaacgtgtgcaagcacataatttcacatgggcgtcccctattatg 
tctcgtccagtaaaaaatatgactgtagcaatcatcggtacagggcgtattggtgc.^.gca 
actggtaaaatctatgctggttttggtgcgagagtagttggttatgatgcatatc: taat 
5 cattctttatctttcttagaatataaagaaacagtagaggatgcaattaaagatgctgat 
attatctcattacatgtacccgctaataaagatagtttccatttatttgataacaatatg 
tttaaaaatgttaaaaaaggtgccgttttagtcaatgccgcaagaggagctgtgataaac 
acgcctgatttaattgaagcagtaaataatggtacattatcaggtgctgccattgacaca 
tatgaaaatgaagctaattatttcacatttgattgttcaaatcaaacgattgacgaccca 
10 atattattagacctaattagaaatgaaaatattttagttacacctcatattgcctttttc 
tccgatgaagcagtacaaaatttagtagagggtggtttgaatgcagcattatcagtaatt 
aatactggcacatgtgatacgcgattaaactaa 

Sequence 2734 

15 MTKIMFFGTRAYEKDMALRWGKECNNIDVTTSTELLSVDTVDQLKDYDGVTTMQFGKLEPE 
VYPKLESYGIKQIAQRTAGFDMYDLELAKKHEIIISNIPSYSPETIAEYSVSIALQLVRK 
FPTIEKRVQAHNFTWASPIMSRPVKNMTVAriGTGRIGAATGKIYAGFGARWGYDAYPN 
HSLSFLEYKETVEDAIKDADIISLHVPANKDSFHLFDNNMFKNVKKGAVLVNAARGAVIN 
TPDLIEAVNNGTLSGAAIDTYENEANYFTFDCSNQTIDDPILLDLIRNENILVTPHIAFF 

20 SDEAVQNLVEGGLNAALSVINTGTCDTRLN* 

Sequence 2735 

Con t .i g_0 8 1 2_pos_ 5666_6487, 

is similar to {with p-value l.Oe-57) 

25 >gp:gp|U75480|SMU75480_l Streptococcus rautans putative HPr(s 
er) kinase (ptsK) and putative prolipoprotein diacylglycerol 

transferase (Igt) genes, complete cds. NID: g392-4622. 
atggctggatatttttcacattatgcttcagaccgtattcaattattagggacaacggag 
ttatcattttataatttacttccagatgaagagaagaaaggaagaatgagaaaattatgc 

30 cgacctgaaactccagcgattattgttacacgtgggttagaaccacccgaagaacttata 
caagcatctcaagaaacgcatacaccaattattgttgcgaaagatgccacaacgagttta 
atgagtaggttaacgacatttctcgaacatgaactcgcgaaaactacttctttgcacggt 
gtacttgttgatgtttacggtgtaggtgtactaattacaggagattctggcattgggaaa 
agtgaaactgcattagaattagtcaaacgaggccatagattagtggctgatgataatgta 

35 gaaatcaaagaaattactaaggatgaacttgtagggaaaccgcctaaacttatcgaacat 
ttgctagagattcgtggtctcggaatcattaatgttatgactttgtttggagcaggatca 
atattaactgaaaaacaagttcgattaaatattaatttagaaaattggaataagaataaa 
ttatacgatcgtgtaggtcttaatgaagaaacattgaaaattcttgatacggaaatcact 
aaaaaaacgataccagttagaccagggcgtaatgtagcagtaattattgaagtagctgct 

40 atgaattatcgtcttaatatcatgggtattaatacagcagttgaatttaatgagagactt 
aatgaagaaatcgttcgaaatagtcataaaagtgaggagtaa 

Seujc-ac^ 2736 

MAGYFShYASDRIQLLGTTELSFYNLLPDEEKKGRMRKLCRPETPAIIVTRGLEPPEELI 
45 QASQETHTPIIVAKDATTSLMSRLTTFLEHELAKTTSLHGVLVDVYGVGVLITGDSGIGK 
SETALELVKRGHRLVADDNVEIKEITKDELVGKPPKLIEHLLEIRGLGIINVMTLFGAGS 
ILTEKQVRLNINLENWNKNKLYDRVGLNEETLKILDTEITKKTIPVRPGRNVAVIIEVAA 
MN y RLN I MG I N T AVE FNERLNEE IVRNSHKSEE* 

50 Sequence 27 37 

Contig_0812_pos_6493__0, 

is similar to (with p-value 8.0e-40) 

>sp:sp|P52282|LGT_STAAU PROLIPOPROTEIN DIACYLGLYCERYL TRANSF 
ERASE (EC 2.4.99.-). >gp ; gp I U35773 I SAU35773_1 Staphylococcus 
55 aureus prolipoprotein diacylglyceryl transferase (Igt) gene 
, complete cds. NID: gl016769. 

atgaatataacattaggatatatcgatcctgttgcctttagcttgggaccaatccaagtt 
cgatggtatggaattattattgcttgtggtatcttacttggatacttcattgcacaagca 
gcattgaaacaggttggattacataaagacaccttaatcgatattatattttatagcgcg 
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attgttggattcatagttgcgagaatatactttgttacatttcaatggccatattacatg 
aatcacttgagtgagataccaaaaatttggcatggtggtattgccatacatggtggctta 
attggtggacttatctctgggattattgtttgtaaaatcaaaaatctacatccgtttcaa 
ataggagatattgtggc 

5 

Sequence 2738 

MNI TLGY I DPVAFSLGPIQVRW YGI 1 1 ACG I LLG Y FI AQAALKQVGLHKDTLI DI I FYSA 
IVGFIVARIYFVTFQWPYYMNHLSEIPKIWHGGIAIHGGLIGGLISGIIVCKIKNLHPFQ 
IGDIVA 

10 

Sequence 2739 

Contig_0812_pos_6327_5 983, 

is similar to {with p-value 3.0e-31) 

>gp:gpIU75480|SMU75480_l Streptococcus mutans putative HPr(s 
15 er, Vrini.se (ptsK) and putative prolipoprotein diacylgly<;erol 
trans±erase (Igt) genes^ complete cds* NID: g3924622. 
gtgatttccgtatcaagaattttcaatgtttcttcattaagacctacacgatcgtataat 
ttattcttattccaattttctaaattaatatttaatcgaacttgtttttcagttaatatt 
gatcctgctccaaacaaagtcataacattaatgattccgagaccacgaatctctagcaaa 
20 tgttcgataagtttaggcggtttccctacaagttcatccttagtaatttctttgatttct 
acattatcatcagccactaatctatggcctcgtttgactaattctaatgcagtttcactt 
ttcccaatgccagaatctcctgtaattagtacacctacaccgtaa 

Sequence 2740 

25 VISVSRIFNVSSLRPTRSYNLFLFQFSKLIFNRTCFSVNIDPAPNKVITLMIPRPRISSK 
CSISLGGFPTSSSLVISLISTLSSATNLWPRLTNSNAVSLFPMPESPVISTPTP* 

Sequence 2741 

Contig_0813_pos_1018_2097, 

30 putative peptide of unknown function 

atgcttgaaaaaacattcgaagtcacgtatacaaatgaacaaaaaattgaattagaagca 
caattgttttcaacacaacttttatttcaatttctcttttcgcaaggtaggttagaagaa 
gcccgaacatatattttgaatcaatcttacgagatacaacagcatagggtgattaggaat 
ttacttgcaatgtgttatttgtatctaggtgagtatgatagcgccaaagcaatgtt.^igaa 

35 gaactcl-.taaaggaagataattcagacgtgcatgcactttgtcactacacattattactt 
tataataaaaaagaaacagaaaaatatcaaaaatatcttaaaatacttaataaagtagta 
ccactaaatgacgacgaaacctttaaattaggaatcgtattgagttatttaaaacagtat 
cgtgcttctcaaaatttactttatccactttataaaaaaggtaaatttgtctctattcaa 
atgtataatgcattgagtttcaatttttattacctaggaaataaagacgaaagtattgag 

40 atgtggaacaagctcactcaaatttctgaagttgatgttggttatgcaccttgggtaatt 
gaggaaagtaaaacggtatttgaatcacgagtgttaccattattactagatgataataat 
cattatcgactttacggtatttttt tact tea tcaattaaatggaaaagaaatactaatg 
actgaagatatttggtcaattcttgaatcaatgaatgactatgagaaactttatctcaca 
tatttggtacaaggactcacactcaataaattagattttatacacagaggtatgcaaagg 

45 ttgtataattttaagaaattcaaatataacacgtctttatttacagattggattaatcaa 
gcagaaatgattatagctgaaaatgtagatttagtagatgtcgatagatatgtagctgca 
tttgtttacctatcgtatcgtcgttctagccaaccacttaccaagaggcaattgatggac 
gattttaatgtttctagatacaaactgaataaagcaattgaatttatattgagcatataa 

50 

Sequence 2742 

MLEKTFEVTYTNEQKIELEAQLFSTQLLFQFLFSQGRLEEARTYILNQSYEIQQHRVIRN 
LLAMCYLYLGEYDSAKAMFEELLKEDNSDVHALCHYTLLLYNKKETEKYQKYLKILNKW 
PLNDP5TFKLGIVLSYLKQYRASQNLLYPLYKKGKFVSIQMYNALSFNFYYLGNKDESIE 
55 MWNKLTQISEVDVGYAPWVIEESKTVFESRVLPLLLDDNNHYRLYGIFLLHQLNGKEILM 
TEDIWSILESMNDYEKLYLTYLVQGLTLNKLDFIHRGMQRLYNFKKFKYNTSLFTDWINQ 
AEMIIAENVDLVDVDRYVAAFVYLSYRRSSQPLTKRQLMDDFNVSRYKLNKAIEFILSI* 
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Sequence 274 3 

Contig_0813_pos_2162_24 97, 

is similar to (with p-value 6.0e-50) 

>gp:gp!AJ223781 |SAAJ3781_1 Staphylococcus aureus trxB gene. 
5 NID: g3582102. 

atnactgaagtagattttgatgtagcaataatcggtgcaggtcctgccggtatgacagca 
gcagtatatgcatctcgtgccaatttaaaaactgtcatgattgaacgcggtatgc, aggc 
ggtcaaatggcaaacactgaagaagtagagaattttccaggatttgagatgatcacaggt 
cctgacttatctactaaaatgtttgaacatgctaaaaaatttggtgcggaataccaatat 
10 ggcgatattaaatctgttgaagataaaggcgactataaagttatcaatttagggaatata 
gttgttgatgtaaatgtgtgtcacagtattgtttaa 

Sequence 27 4 4 

MTEVDFDVAIIGAGPAGMTAAVYASRANLKTVMIERGMPGGQMANTEEVENFPGFEMITG 
15 PDLSTKMFEHAKKFGAEYQYGDIKSVEDKGDYKVINLGNIVVDVNVCHSIV* 

Sequence 274 5 

Cont ig_08 1 3_pos_58 9 3_6 4 65 , 

putative peptide of unknown function 

20 atgaaaaaagtattagcactattatttgcgtcaacactcattttaggagcatgtggggac 
aaaaatgacgaatctaaaaatgattcttcatctaattctacagatagcgtttctgtagac 
aaaaatgacaatgaagataaaacaacagcgaaattcaaaaatgataaattaacaacagac 
aattttgatattgaaattttagaagccaaaactgtcaaagcatctgagtacgatgatgac 
aagaaaccaagtatcgctatcatctatggtgttaaaaataaaaaagacaaagatttaaca 

25 gcatcttcagcatttatcgaatcatttgatatttatcaaaattctaaagatgttaagaga 
agrittaotjaattggcggtggatatgatactgatttaaaggaaaaatatgaagaagatatg 
gataacaaaattaacaaagatggaaaagtaaaaggtgttatgttctttaaattga<uagac 
actaaaacaccagtaacacttgaagctaaagatccaaatcatagcaataacgaaaaagtg 
ggtactaaagaatttaaaattaaagaaaaataa 

30 

Sequence 274 6 

MKECVLALLFASTLILGACGDKNDESKNDSSSNSTDSVSVDKNDNEDKTTAKFKNDKLTTD 
NFDIEILEAKTVKASEYDDDKKPSIAIIYGVKNKKDKDLTASSAFIESFDIYQNSKDVKR 
RLEIGGGYDTDLKEKYEEDMDNKINKDGKVKGVMFFKLKDTKTPVTLEAKDPNHSNNEKV 

35 GTKEFKIKEK* 

Sequence 2747 

Contig_0813_pos_6510_0, 

putative peptide of unknown function 

40 atgtctcaaaaaattaaagttatagtgcctatcgtacttggattaatcattcttttgggg 
attgcttggggtgtatatgcctttgtaacaaatacacctaaaaatgcatatttactgagc 
gaaaaaaagaccgctataaatgtaaaatcttatgttgatcatcgttttagtaacgaaaag 
aaattccaaaaaaaattaaaagataattcatatgttaatacgtataatctacatgctaat 
gcatctaaggaatatctaaaagatcttggtttacctaaaactattttagatagttctaaa 

45 ataactggaactatcggtcatgatccaaaatcaaataaaggaatcatgagcgtatcacct 
aaaai:attragataaagatattggtaagttccaatggacagcaaatgattcaactcaattc 
ttcgaatcacccttattcaagaaaaagtatagcgtcaaaaactcagaactattagaaaca 
gctgctcaaatctttgatgaagatccttctgactataaagaagagggactttcaaatgca 
aactttgatctgaataataaattgggtattgttcattctcaacaagaagatgttaaaaaa 

50 ctgattaagcgatatacagatttagtcatcgatcaattagaagatgatgactttgaaaaa 
ggtaaaaaagaaaaagttaaaattgacggtgaaacaaaaaatttaaaacctatcacttta 
aatataagtcgtgataaagcaaaaaaaatcactgtcgcagcattaaaaaaagctaaaaat 
gataaagaattac 

55 Sequence 2748 

MSQKIKVIVPIVLGLIILLGIAWGVYAFVTNTPKNAYLLSEKKTAINVKSYVDHRFSNEK 
KFQKKLKDNSYVNTYNLHANASKEYLKDLGLPKTILDSSKITGTIGHDPKSNKGIMSVSP 
KILDKDIGKFQWTANDSTQFFESPLFKKKYSVKNSELLETAAQIFDEDPSDYKEEGLSNA 
NFDLNNKLGIVHSQQEDVKKLIKRYTDLVIDQLEDDDFEKGKKEKVKIDGETKNLKPITL 
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NISRDKAKKITVAALKKAKNDKELX 

Sequence 274 9 
Con t i g_0 8 1 3_pos_5 2 7 5_4 4 8 4 , 
5 putative peptide of unknown function 

atgacaattcttgttcattcaaagcatcaaccgagcgaatacgcagcgattgcacatcag 
ttgatggcgacaacacatgtgtgttgtgaacaagtagggttcattgaatcagtaaactat 
gaaaatggggataactatcacttggtaatgagtggaaatgaattttgtggtaatgcgact 
atgtcttacattcactatttaaaagaacgtttattgatacagcatcaacaatttcaatta 

10 agagtttcgggatgttctcatcctgtagagtgtaaagttcattcgcaacattatgaagtg 
actatgccaaaagtacatcaagttaaggaaagatttgtgaaattaggggaccaacagttt 
aaagcatttgaaattagatacgatacatacattcattacgtgttgatgtgtgatggtgta 
gatttagcaatgaaacagcgcgtggaagattttgtcagtgcgcaaacatggcatcaacaa 
tttaaaacgattggcgtcatgctttttcaacaagataaacaattcatatatccactgata 

15 catatacctaaaatagatagcttaatctgggaaaatagctgtggttcaggagcggcttct 
atcggtgtgttagttaattatctaacagatcatgatattcaagattacctagttaaccaa 
cccggaggcagtattattgtctcatccagaaagtctggacaaaatgaataccaaacaacg 
attaagtcjtcaagtttcaactgtcgcaacaggacaagcatatatagaacaggagacaatg 
acgcaaatatga 

20 

Sequence 2750 

MTILVHSKHQPSEYAAIAHQLMATTHVCCEQVGFIESVNYENGDNYHLVMSGNEFCGNAT 
MSYIHYLKERLLIQHQQFQLRVSGCSHPVECKVHSQHYEVTMPKVHQVKERFVKLGDQQF 
KAFEIRYDTYIHYVLMCDGVDLAMKQRVEDFVSAQTWHQQFKTIGVMLFQQDKQFIYPLI 
25 HIPKIDSLIWENSCGSGAASIGVLVNYLTDHDIQDYLVNQPGGSIIVSSRKSGQNEYQTT 
IKCQVSTVATGQAYIEQETMTQI* 

Sequence 2751 

Co n t i g_0 8 1 3_p os_3 6 6 4 _2 4 4 7 , 

30 putative peptide of unknown function 

atggttggtagtggaccggtcgctattcaacttgctcgactatgtcatttacatggagaa 
catatagttgatatggtgagtcgcgttcatgcatcaaccaaatctaagagagtctttgat 
gcttatcaacgtgacggctttttttcagtaatgactcaaaatgatgcacatcagtgtttt 
tcaggtaagtttacggttagacatttttttaaagatgttaaagatattactgaatattat 

35 gacgtggtgattttagcatgtactgccgatgcgtatcgaccgatattacagcaattatct 
aagtccacattaaagcgtattaagcaaatcatcttggtctcaccaacattaggatcacat 
atf'cttgttaagcaattactatcagatgttcaatgtgaaggtgaagtgatttcattttcc 
acttc.tctaggcgatacccgaatatttgataaagcacaaccacattgtgtcctaa -:caca 
cgagttaaatcaaaattattcgtaggttcgactcaatctcagtctatgacgttgtgtaag 

40 cttaagtctttatttgactatttgaatatagaattaacaacgatggacacaccactacat 
gcggagatacataatagttcactttatgtacacccaccattgtttatgaatcaattttca 
ttaaaggcggtatttgaagggacgaaagtaccagtatatgtatataagctatttccagag 
ggtccaatcacaatgaccttaatacacgaaatgcgattaatgtggcaagaaatgatgatg 
atattaaaaaaattaaaggtaccttcggtcaatcttctaaagtttatggtgaaagaaaac 

45 taccctatacgttatgagaccatgcgcgaagtagatattgaaaactttaaaaatttacca 
gctattcatcaagagtatctactttatgtgcgatatacagcaattttaatcgatccgttt 
tctaatccggacgatcaaggtgcatattttgatttttctgccgtaccatacaaacatgtt 
gatactgatgaacaaggagtcatacatataccacgcatgccgagtgaagattattatcgt 
actttgataattcaagcgattggaagagcattaaacgttgcaacaccgatgattgacaca 

50 ttgttattacgt tatgaaaatactgttaaacaatactgtgacacacatttacatcaacaa 
ctatattccctaaattga 

Sequence 2752 

MVGSGPVAIQLARLCHLHGEHIVDMVSRVHASTKSKRVFDAYQRDGFFSVMTQNDAHQCF 
55 SGKFTVRHFFKDVKDITEYYDVVILACTADAYRPILQQLSKSTLKRIKQIILVSPTLGSH 
MLVKQLLSDVQCEGEVISFSTYLGDTRIFDKAQPHCVLTTRVKSKLFVGSTQSQSMTLCK 
LKfiO^DYINIELTTMDTPLHAEIHNSSLYVHPPLFMNQFSLKAVFEGTKVPVYVYF.LFPE 
GPIThTLXHEMRLMWQEiyiMMILKKLKVPSVNLLKFMVKENYPIRYETMREVDIENz'KNLP 
AIHQEYLLYVRYTAILIDPFSNPDDQGAYFDFSAVPYKHVDTDEQGVIHIPRMPSEDYYR 
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TLX I QAI GRALNVAT PMI DTLLLRYENTVKQ YCDTHLHQQLYSLN* 

Sequence 2753 
Contig_0814_pos_338_709, 
5 is similar to (with p-value 8.0e-51) 

>gp:gp| D76414 I D76414_4 Staphylococcus aureus gene for histid 
yl-tRNA synthetase, ppGpp hydrolase, lytic enzyme, complete 
cds. NID: g2580431. 

atr;actcrjttccaatgataaatatgtgtccctagatgatagaaatataaagggtg^-::gca 
10 tttattegtatacataatgatgcgcttgattcttcaaatgccaatggagtgactgr.ttat 
tggt ttaaagacaagcaagaatcacttgcacaaactttaaattctgcaattcaaaagaag 
gcattattgacaaatcgaggttctagacaacaaaattatcaagtgttgagacagacagat 
ataccagcagtactgttagagttaggctatataagtaatcctactgatgaatcaatgatt 
aatgatcaattacatagacaagtggttgaacaagctattgttgatggtttaaaacaatat 
15 ttctcgtcctag 

Sequence 2754 

MTRSNDKYVSLDDRNIKGDAFISIHNDALDSSNANGVTVYWFKDKQESLAQTLNSAIQKK 
ALLTNRGSRQQNYQVLRQTDIPAVLLELGYISNPTDESMINDQLHRQVVEQAIVDGLKQY 
20 FSS* 

Sequence 2755 

Cent ig_0814_pos_1081_2 355, 

is similar to (with p~value O.Oe+00) 

25 >sp:sp|032422 jSYHSTAAU HISTIDYL-TRNA SYNTHETASE (EC 6.1.1.2 
1) (HISTIDINE— TRNA LIGASE) (HISRS). >gp : gp | D764 14 | D7 64 14__5 
Staphylococcus aureus gene for histidyl-t RNA synthetase, ppG 
pp hydrolase, lytic enzyme, complete cds. NID: g2580431. 
atgattaegatgccaagaggtacccaagatatcttgccgcaagattctgctaaatqgcgt 

30 tacattgaaaatcgattacacacattaatggaattgtataattataaagaaataagaacg 
ccaatttttgaaagtactgaactttttgcaagaggcgtgggggattctactgacgttgtt 
caaaaggaaatgtatacatttaaagataaaggggatcgtagtttaacattacgtcctgaa 
ggaactgcagccgttgtacgttcatatattgaacacaaaatgcaaggtgaacccaatcaa 
cctatcaaactttactacaatggtcctatgtttagatacgaacgtaaacaaaaaggaaga 

35 tatcgccaatttaaccaatttggtgtcgaagcaataggagcggaaaatcctagtattgat 
gctgaaatactcgctatggttatgcatatatatgagtctttcggattaaagcatttaaag 
ttagttatcaatagtattggtgatagtgaatcacgtaaagaatataacgaagcattagta 
aaacattttgaacctgtgattgatacattttgttcagattgtcaatcaagattacacact 
aatcctatgagaattttagattgtaaaatcgatagagataaagaagcagtaaaaaatgca 

40 ccccgtatcacagattatctcaataatgattctaaatcttattatgaacaagttaaatta 
catcttgataatttgaacatatcttatgttgaagatcctaacttagttcgtgggttagat 
tattatactcatactgcctttgaattaatgattgataatccagagtatgatggagctatc 
actacattatgtggtggtggtcgatataatgggttgttacaattattagatggtccagat 
gaaacaggtattgggtttgcactaagtattgaaagattattgatggcacttgatgaagaa 

45 ggtatttcattagatgtaagtgaagattttgatttatttgttgtgacaatgggagaagat 
gccgatcgttatgctgttaagttaatcaatgatttaagaagaaatggaataaaagtagat 
aaggattatctaaacagaaaaattaaaggacaaatgaaacaagctgaccgtcttaatgct 
aaatatacagtagtaattggagatcaagagcttgaaaataatgaaattggtgtgaaaaac 
atgacctcaggcgaatcagaaaatgtacaattagacgaattggttaattattttacaagt 

50 agaaaggaagtctaa 

Sequence 2756 

MIKMPRGTQDILPQDSAKWRYIENRLHTLMELYNYKEIRTPIFESTELFARGVGDSTDW 
QKEMYTFKDKGDRSLTLRPEGTAAWRSYIEHKMQGEPNQPIKLYYNGPMFRYERKQKGR 
55 YRQFNQFGVEAIGAENPSIDAEILAMVMHIYESFGLKHLKLVINSIGDSESRKEYNEALV 
KHFEPVIDTFCSDCQSRLHTNPMRILDCKIDRDKEAVKNAPRITDYLNNDSKSYYEQVKL 
HLDNLNISYVEDPNLVRGLDYYTHTAFELMIDNPEYDGAITTLCGGGRYNGLLQLLDGPD 
ETGIGFALSIERLLMALDEEGISLDVSEDFDLFVVTMGEDADRYAVKLINDLRRNGIKVD 
KDYLNRKIKGQMKQADRLNAKYTVVIGDQELENNEIGVKNMISGESENVQLDELVNYFKS 
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Sequence 2757 

Contig_0815__pos_234 8_3526, 
5 is similar to (with p-value l.Oe-92) 

>sp:spl P384 94 |RS1H_BACSU SOS RIBOSOMAL PROTEIN SI HOMOLOG . > 
gp;gp|011687|BSU11687_5 Bacillus subtilis 168 jofA, jofB^ Ms 
sA homolog (jofC) and ribosomal protein SI homolog IjofD) ge 
nes, complete cds, and joeB gene, partial cds. NID: g533101. 

10 >gp:gp|Z99115|BSUB0012_228 Bacillus subtilis complete genom 
e (section 12 of 21}: from 2195541 to 2409220. NID: g2634478 
• >gp:gp|L47648 I BACSERA_20 Bacillus subtilis phosphoglycerat 
e dehydrogenase (serA) , ypaA, ferredoxin (fer), ypbB, recS, 
ypbD, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase (ypcA) 

15 , ypdA, ypdB, ypdC, spore cortex lytic enzyme (sleB), ypeB, 
ypfA, ypfB, cytidine monophosphate kinase (cmk), ypfD, ypgA, 
yphA, yphB, yphC, NAD+ dependent glycerol-3-phosphate dehyd 
rogenase (glyc) , yphE and yphF genes, complete cds. NID: gll 
46195. >gp:gp|L4 7 64 8|BACSERA_20 Bacillus subtilis phosphogly 

20 cerate dehydrogenase (serA), ypaA, ferredoxin (fer), ypbB, r 
ecS, ypbD, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogen^ise ( 
ypcA) , ypdA, ypdB, ypdC, spore cortex lytic enzyme (sleB), y 
peB, ypfA, ypfB, cytidine monophosphate kinase (cmk), ypfD, 
ypgA, yphA, yphB, yphC, NAD+ dependent glycerol-3-phosphate 

25 dehydrogenase (glyc) , yphE and yphF genes, complete cds. NID 
: gll46195. 

atgactgaagaattcaatgaatcaatgattaatgatattaaagaaggtgacaaagtcact 
gttgaagttcaacaagtagaggataaacaagttgttgtgcatattaatggtggcaaattt 
aatggaattattcctattagccagctttcaacacatcatatcgaaaaccctagtgaagtt 

30 gtaaaagtcggtgatgaagtcgaagcatatgtcactaaaatcgagttcgacgaagaaaat 
gatactggggcatacattttatcaaaaagacaacttgaaactgaaaaatcttatgaatat 
ttacaagaaaaactagataacgatgaagtgattgaagctgaagttactgaagtagttaaa 
ggtggtttagtcgttgacgttggtcaaagagggtttgtacctgcttctctaatttcaact 
gatttcattgaagattt ttctgtattcgatggtcaaacaatccgtattaaagtggaagaa 

35 cttgatcctgaaaacaatagagtcattttaagccgtaaagctgtggaacagttagaaaac 
gacgctaaaaaagcttcaatattagattctttaaatgaaggcgatgttattgatggtaaa 
gttgctcgattaactaactttggtgctttcattgatattggtggcgtagatggtttagtt 
cacgtttctgaattatctcatgaacatgttcaaacaccagaagaagttgtgtcagtaggt 
gaagcagtcaaagttaaagttaaatctgtagaaaaagattctgaacgtatttctttatct 

40 attasaaacactttaccaacaccatttgaaaacattaaagggaaatttcacgaagotgat 
gttattgaaggtactgtagtacgtttggcgaactttggcgcattcgtagaaattgctcca 
tccgtccaaggtttagtgcatatttctgaaatcgatcataaacatatcggttctcctaac 
gaagtattagaacctggacaacaagttaatgtaaaaatattaggtatcgatgaagataat 
gaaagaatttcattatcaatcaaagcaacgttacctaaagaaaatgtcattgaaagtgac 

45 gcatccacaactcaatcatatcttgaagatgataatgatgaagataaaccaacattaggc 
gatgtttttggtgataaatttaaagaccttaagttttaa 

Sequence 2758 

MTEEFNESMINDIKEGDKVTVEVQQVEDKQVVVHINGGKFNGIIPISQLSTHHIENPSEV 
50 VKVGDEVEAYVTKIEFDEENDTGAYILSKRQLETEKSYEYLQEKLDNDEVIEAEVTEVVK 
GGLWDVGQRGFVPASLISTDFIEDFSVFDGQTIRIKVEELDPENNRVILSRKAVEQLEN 
DAKKASILDSLNEGDVIDGKVARLTNFGAFIDIGGVDGLVHVSELSHEHVQTPEEVVSVG 
EAVKVKVKSVEKDSERISLSIKDTLPTPFENIKGKFHEDDVIEGTVVRLANFGAFVEIAP 
SVQGLVHISEIDHKHIGSPNEVLEPGQQVNVKILGIDEDNERISLSIKATLPKENVIESD 
55 ASTTQSYLEDDNDEDKPTLGDVFGDKFKDLKF* 

Sequence 2759 

Con t i g_0 8 1 5_pos_ 4093^5052, 

is similar to (with p-value O.Oe+00) 
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>sp:sp|P50743|YPHC_BACSU HYPOTHETICAL 48.8 KD GTP-BINDING PR 
OTEIN IN CMK-GPSA INTERGENIC REGION. >gp : gp I Z99115 I BSUB0012_ 
224 Bacillus subtilis complete genome (section 12 of 21) : fr 
om 2195541 to 2409220. NID: g2634478. >gp: gp | L47 64 8 | BACSERA_ 
5 24 Bacillus subtilis phosphoglycerate dehydrogenase (serA) , 
ypaA, ferredoxin (fer) , ypbB, recS, ypbD, ypbE, ypbF^ ypbG, 
ypbH, glutamate dehydrogenase (ypcA) , ypdA, ypdB, ypdC, spor 
e cortex lytic enzyme (sleB), ypeB, ypfA, ypfB, cytidine raon 
ophosphate kinase (cmk) , ypfD, ypgA, yphA, yphS, yphC, NAD+ 

10 dependent glycerol-3-phosphate dehydrogenase (glyc) , yphE an 
d yphF genes, complete cds . NID: gll46195. >gp:gp j L4 764 8 | BAG 
SERA 24 Bacillus subtilis phosphoglycerate dehydrogenasfi (se 
rA) , ypaA, ferredoxin (fer), ypbB, recS, ypbD, ypbE, ypbF, y 
pbG, ypbH, glutamate dehydrogenase (ypcA) , ypdA, ypdB, ypdC, 

15 spore cortex lytic enzyme (sleB), ypeB, ypfA, ypfB, cytidin 
e monophosphate kinase (cmk), ypfD, ypgA, yphA, yphB, yphC, 
NAD+ dependent glycerol-3-phosphate dehydrogenase (glyc) , yp 
hE and yphF genes, complete cds. NID: gll46195. 
gtgaataaagttgataatcttgaaatgcgtaatgatatctatgatttctattctttaggc 

20 tttggagatccatatcctatttctggttcacatggattaggacttggagatttgctagat 
gcagttgttgaaaactttaataaagaatcagaagatccttatgacgaagatacgatacgt 
ctttctatcatcggtagacctaatgttggtaaatctagcttggtcaatgctattttaggc 
gaagaacgtgttattgtgtctaatgttgctggtacaactcgagatgccattgataccgag 
tactcttatgatggacaagattatgtattgattgatactgctggaatgagaaaaaaaggt 

25 aaggtgtatgaatcgactgaaaaatattctgtattacgtgcattaaaagcgattgagcgt 
tcagaagtagtattagtagttatcgatgctgaacaaggtataattgaacaagataaacgt 
gtagctggctatgcacatgaggaaggtaaagctattgtcattgtagtaaataaatgggat 
acagttgaaaaagatagtaagacaatgaaaaaattcactgatgatgttagaaatgaattt 
caatttttagattatgctcaaatcgcgttcgtatcagcaaaagaagggctaagattaaaa 

30 acattattcccttatatcaatcaagccagtgaaaatcataaaaagcgtgtccaaagttct 
acactaaatgaagttgttactgatgccatctctatgaatccaacacctactgacaaaggt 
agaa vacvtaatgtattct atacaactcaggttgcaattgaaccaccgacatttgVagta 
tttgtcaatgatgttgaattaatgcatttttcttataggagatatttagaaaatcaaata 
cgtaatgcttttggttttgaaggaacacctattcatattattccaagaaaaagaaattaa 

35 

Sequence 2760 

VNKVDNLEMRNDIYDFYSLGFGDPYPISGSHGLGLGDLLDAVVENFNKESEDPYDEDTIR 
LSIIGRPNVGKSSLVNAILGEERVIVSNVAGTTRDAIDTEYSYDGQDYVLIDTAGMRKKG 
40 KVYESTEKYSVLRALKAIERSEVVLVVIDAEQGIIEQDKRVAGYAHEEGKAIVIVVNKWD 
TVEKDSKTMKKFTDDVRNEFQFLDYAQIAFVSAKEGLRLKTLFPYINQASENHKKRVQSS 
TLNEWTDAISMNPTPTDKGRRLNVFYTTQVAIEPPTFVVFVNDVELMHFSYRRYLENQI 
RNAFGFEGTPIHIIPRKRN* 

45 Sequence 27 61 

Cont ig_08 15_pos_507 3_607 1 , 

is similar to (with p-value 0,0e+00) 

>gp:gp| Z99115 I BSUB0012_223 Bacillus subtilis complete genome 
(section 12 of 21): from 2195541 to 2409220. NID: g2634478. 
50 >gp:gp|L47648 |BACSERA_25 Bacillus subtilis phosphoglycerate 
dehydrogenase (serA), ypaA, ferredoxin (fer), ypbB, rer.';, y 
pbD, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase (y;. cA), 
ypdA, ypdB, ypdC, spore cortex lytic enzyme (sleB) , ypeB, y 
pfA, ypfB, cytidine monophosphate kinase (cmk), ypfD, ypgA, 
55 yphA, yphB, yphC, NAD+ dependent glycerol-3-phosphate dehydr 
ogenase (glyc), yphE and yphF genes, complete cds. NID: gll4 
6195. >gp:gp|L47648 |BACSERA_25 Bacillus subtilis phosphoglyc 
erate dehydrogenase (serA) , ypaA, ferredoxin (fer), ypbB, re 
cS, ypbD, ypbE, ypbF, ypbG, ypbH, glutamate dehydrogenase (y 
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pcA) ; ypdA, ypdB, ypdC, spore cortex lytic enzyme (sleB) , yp 
eB, ypfA, ypfB, cytidine monophosphate kinase (cmk) , ypf y 
P9A# yphA, yphB, yphC, NAD+ dependent glycerol-3-phospha::e d 
ehydrogenase (glyc) , yphE and yphF genes, complete cds. NID: 
5 gll46195. 

atgagaaaaattacagtttttggtatgggtagttttggtactgcattagctaatgtatta 
gctcaaaatggtcatgatgttttaatgtggggcaaaaatgtagagaatgtagatgaactt 
aacacacatcatatgaacaaaaattatcttaaagatgctaaattagattcatctataaaa 
gcaactgtagatttaaataaggcagtgcaattttcagatatctatcttatggcactacct 

10 acaaaagcaattagagaagtatcaaaagatatcgatcaattactcacatctaaaaagact 
tttattcatgttgctaaaggcattgaaaacgatacatttaagcgcgtatctgaaatgatt 
gaggactctatctcttcagaacataatggaggaatcggcgtcttatcaggtccaagtcat 
gctgaagaagttgttataaaacaacctacaactgtagctgcatcatctaaagataataat 
gtgagcaaacttattcaagatttatttatgaacgactatttacgtgtttacacaaataat 

15 gatttagtaggtgtagaattaggtggtgctttaaaaaatattatagctatagctagtggt 
atcgttgccggcatgggttacggtgataatgcaaaagcagctttaatgacacgaggttta 
gccgaaatcagtcgacttggtgagaaacttggtgcagatccaatgactttcttaggtcta 
ggtggcataggtgacttgatcgtaacttgtacgtccacacattcacgaaattacacactt 
gggtttaaattagggcaaggcaaaacagcagaagaagctttaaaagagatgaaaatggtg 

20 gttgaaggtatttatacaactaaatcagtatatcatcttgctcaacaagaaggagtagag 
atgcctatcactaacgcattatatgaagttttatttgaagatgtccctgtaagtaaaagt 
gttagaacacttatggaaagagacaaaaaagcagaataa 

Sequence 2762 

25 MRKITVFGMGSFGTALANVLAQNGHDVLMWGKNVENVDELNTHHMNKNYLKDAKLDSSIK 
ATVDLNECAVQFSDIYLMALPTKAIREVSKDIDQLLTSKKTFIHVAKGIENDTFKRVSEMI 

EDSISSEHNGGIGVLSGPSHAEEVVIKQPTTVAASSKDNNVSKLIQDLFMNDYLRVYTNN 
DLVGVELGGALKN 1 1 AI ASGI VAGMG YG DNAKAALMTRGLAE I SRLGEKLGADPMT FLGL 
GGIGDLIVTCTSTHSRNYTLGFKLGQGKTAEEALKEMKMWEGIYTTKSVYHLAQQEGVE 
30 MPITNALYEVLFEDVPVSKSVRTLMERDKKAE* 

Sequence 2763 

Cont i g_0 8 1 5_po s_0_3 6 8 , 

is similar to (with p-value 2.0e-21) 

35 >sp:sp|P4 2086|PBUX_BACSU XANTHINE PERMEASE. >pir : pir | S51310 | 
S51310 xanthine permease - Bacillus subtilis >gp :gp | L7724 6 ( B 
ACYACA_3 Bacillus subtilis (YAClO-9 clone) DNA region betwee 
n the serA and kdg loci. NID: gl256615. >gp:gp I Z99115 1 BSUBOO 
12147 Bacillus subtilis complete genome (section 12 of 21) : 

40 from 2195541 to 2409220. NID: g263447B. >gp: gp | X83878 1 BSXPT 
PBUX_2 B. subtilis xpt and pbuX genes. NID: g633168. 
atgtgcggggtagcgacatttcttcaagcaaataaagtcacagggactggattaccgatt 
gtactag9atgtacgtttactgccgttgcacctatgatactcatcggtcaaacga<dagga 
cttgatgttttatatggttcgcttttaatatccggtatcttagttgttttaattgcacct 

45 tttttctcttatttagttaaattctttccacctgttgtaacaggaagtgttgtgacaatt 
attggaatcaatttaatgccagttgcaatgaattacttggcaggtggtgaaggagcgaaa 
aactatggcgatactaagaatttaatattaggtggtgttacactactcattattcttatt 
tGATTTAT 

50 Sequence 27 64 

MCGVATFLQANKVTGTGLPIVLGCTFTAVAPMILIGQTKGLDVLYGSLLISGILVVLIAP 
FFSYLVKFFPPVVTGSVVTIIGINLMPVAMNYLAGGEGAKNYGDTKNLILGGVTLLIILI 
*FX 

55 Sequence 2765 

Contig_0818_pos_4 072_3 692, 

putative peptide of unknown function 

gtgtttattttgttaacatttggattttatgtattttttgctggccataataatccaggt 
ggtggctttattggtggcttgatttttagctcggcatttatcttaatgtttcttgccttt 
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gatgtaaatgaagtgttgaaaagcttgcctattgattttaaaaaattaatgattataggt 
tcactcatatctgttgcaactgcatcagtgcctatgttttttgggaagccatttttatat 
caaactgaagcaaatgtaacatttccattactaggacatgttcatgttactactgtgact 
ttatttgagcttggcatcttattaacagtagtaggtgtgattgttacagttatgctatct 
5 ataagtgggggtagatcatga 

Sequence 2766 

VFILLTFGFYVFFAGHNNPGGGFIGGLIFSSAFILMFLAFDVNEVLKSLPIDFKKLMIIG 
SLISVATASVPMFFGKPFLYQTEANVTFPLLGHVHVTTVTLFELGILLTVVGVIVTVMLS 
10 ISGGRS* 

Sequence 2767 

Contig_0818_pos_367 4_3351, 
is similar to (with p-value 3.0e-20) 
15 >gp:gp| AB015981 I AB015981_4 Staphylococcus aureus genes for 0 
rfA, MnhA, MnhB, MnhC, MnhD, MnhE, MnhF and MnhG, complete c 
ds. NID: g4001723. 

gtgataggatttttagtgtttattggaacttatatgattttatctattaatttaattcgt 
attgttattggtatttctatttatacacacgccggtaatttaattattatgagtatgggg 
20 aaatatggacctcatatgtctgaaccgctaattcaaggtcatgctcaaaactttgttgat 
cctttattacaagctatcgttttaacagctattgtgattggatttggtatgactgcgttt 
ttattggtgttaatatatagaacttacagagtaactaaagaggatgaaataagtgcattg 
aaaggtgatgaagatgatgagtaa 

25 Sequence 27 68 

VIGFLVF7GTYMILSINLIRIVIGISrYTHAGNLIIMSMGKYGPHMSEPLIQGHAQ:>IFVD 
PLLQAIVLTAIVIGFGMTAFLLVLIYRTYRVTKEDEISALKGDEDDE* 

Sequence 2769 
30 Contig_0818_pos_3334_1859, 

is similar to (with p-value 2.0e-62) 

>gp:gp|AB015981 I AB015981_5 Staphylococcus aureus genes for O 
rfA, MnhA, MnhB, MnhC, MnhD, MnhE, MnhF and MnhG, complete c 
ds. NID: g4001723. 

35 atgttgttgccttttgtatgtgctttaattttagtcttcactaaaaataaaaatcgtatt 
tcgaaaatcctatccattacaactatgattgttaatacaatgatttcaattgctttactt 
atttatgtcgttaatcataaaccgataacacttgattttgggggatggaaagcacctttc 
ggcattcaatttctaggtgattcactgagtctgcttatggtgtcagtatcatcttttgtt 
gttacgctaataatggcatacggctttggtagaggggagaagcgagtcaatcgatttcac 

40 cttcctacatttattcttttattaacagtaggtgttattggttcgtttttaacttctgat 
ttatttaacctatacgtgatgtttgaaattatgcttcttgcttcgtttgtacttgt taca 
ttaggacaatctgttgaacaattacgtgcagcgatagtatatgttgttctgaatatttta 
ggttcgtggttgcttttattaggaattggcatgttatataagacagtcggaacact taat 
ttctcacatttagcgatgcgattgaatcatatggaaaataaccaaacaataacgatgata 

45 tctttagtatttctagttgcttttagttcaaaggcagcactagtgattttcatgtggtta 
cctaaagcatatgcagtgcttaatacggaacttgccgcgttatttgcagcattgatgaca 
aaagttggagcttatgcgcttattcgtttttttactttactattcgaccatcatccaagc 
gtcacgcatacattgctcgtgtttatggcttgtatcacaatgattatcggtgcatttggt 
gtcatcgcttacaaagatattaagaaaattgcggcttatcaagttattttgtctat tgga 

50 ttcattattttaggtttaggttctcatactatatcaggtgtaaatggtgctatcttctat 
ttagcgaatgatattatcgttaagacattattgttttttgtaattggtagtcttgtttat 
atgtcaggctatcgaaattatcagtatttaagtggactggcaaaaagagaaccattcttt 
ggtgttgcatttgtcgtggtaatttttgctataggtggcgtacctccttttagtggcttt 
ccgggtaaagtcttaatattccaaggggctattacaaatggtaattatattggtttagca 

55 cttatgattgtgacaagtttaattgctatgtatagtctttttagagtgatgtttataatg 
tattttggtgatgctgacggagaacaagtacaatttagaccactacctatttatcgtaaa 
ggtttacttagtgttttagttgtagtggtattagcgatgggtattgcagcccctgttgtt 
ctgaaagtaacagaggatgcaacaaatcttaatatgaaagaagatgtctttcaaaagaat 
gtaaatacacatttgaaggaggttaatcataagtga 



677 



wo 01/34809 



PCT/USOO/30782 



Sequence 2770 

MLLPFVCALILVFTKNKNRISKILSITTMIVNTMISIALLIYWNHKPITLDFGGWKAPF 
GIQFLGDSLSLLMVSVSSFVVTLIMAYGFGRGEKRVNRFHLPTFILLLTVGVIGSFLTSD 
5 LFNLYVMFEIMLLASFVLVTLGQSVEQLRAAIVYVVLNILGSWLLLLGIGMLYKTVGTLN 
FSHLAMRLNHMENNQTITMISLVFLVAFSSKAALVIFMWLPKAYAVLNTELAALFAALMT 
KVGAYALIRFFTLLFDHHPSVTHTLLVFMACITMIIGAFGVIAYKDIKKIAAYQVILSIG 
FIILGLGSHTISGVNGAIFYLANDIIVKTLLFFVIGSLVYMSGYRNYQYLSGLAKREPFF 
GVAFVVVIFAIGGVPPFSGFPGKVLIFQGAITNGNYIGLALMIVTSLIAMYSLFRVMFIM 
10 YFGDADGEQVQFRPLPIYRKGLLSVLVVVVLAMGIAAPVVLKVTEDATNLNMKEDVFQKN 
VNTHLKEVNHK* 

Sequence 2771 
Contig_0818_pos_1751_1380, 

15 putative peptide of unknown function 

gtgatttatattctgcatcgcttttttggtgaagaattttatttgaaaaagatatgggtg 
gctattaaatttttagctgtatacctataccagcttattacttctagtataagtaccata 
aattacatcttatttaagacgaatgaagttaatccaggtttactcacatatgaaacttca 
ttaaaaagtaattgggctattacttttttaacgattttaattattattactccaggatcg 

20 acagt tattcgaatttctaaaaatactaataaattttttattcacagtattgatgtgtca 
gaaaaagataaagaaaatcttctaaaaagtattaagcagtatgaggatttaattttggag 
gtgacacgatga . 

Sequence 2772 

25 VIYILHRFFGEEFYLKKIWVAIKFLAVYLYQLITSSISTINYILFKTNEVNPGLLTYETS 
LKSNWAITFLTILIIITPGSTVIRISKNTNKFFIHSIDVSEKDKENLLKSIKQYEDLILE 
VTR* 

Sequence 2773 
30 Contig_0819_pos_5530_6732, 

is similar to (with p-value O.Oe+00) 

>sp:sp|Q53634 IMENE STAAU 0-SUCCINYLBEN20IC ACID~COA LIGASE 
(EC 6.2.1.26) (OSB-COA SYNTHETASE) (O-SUCCINYLBENZOATE-COA S 
YNTHASE) . >gp:gp|U51132 jSAU51132_l Staphylococcus aureus o-s 

35 uccinylbenzoic acid CoA ligase (mene) , and o-succinylbenzoic 
acid synthetase (menc) genes, complete cds . NID: gl255258. 
atgataaatacacgtttaacgcgacatgagatgataaatcaaatgaattcagtcgacata 
gcaacgattgtacacacgttgcctttagaattagaagggtttaatttatatcattttaat 
gatttaacacaattagataaacatgatgtttcaggttacaaatttaatttagaatcgatt 

40 gcatcaattatgtttacgtctggaacgacgggacctcaaaaagctgtgcctcaaacgttt 
aa t aat cat ttagccagtgctaaaggctgtaaacaaagtttaggattcgaacaaaa tact 
gtgtggctttcggtcttacctatatatcatatttctgggctcagtgttattttgcgcgca 
gtgatagaaggattcactgtcagacttgttaaaaagtttcaaactgatgatatgttaaca 
caaataaagacttatccaatcacccatatgtcccttgttccacaaacgttaaagtggtta 

45 atggatgcaggattgactcaaccattttctttagaaaaaattctgctaggtggtgctaaa 
ttatcaccacaattaattgagcaagcattgacttatcgtttacctgtatataattctttt 
ggtatgacagaaacttgctctcagtttctaacagcctcacctcaaatgctcaaagaacgt 
ttcgatactgttggaaaaccaagtgaaaatgtcgaagtgaaaataaaaaatcccaacgca 
tatggacatggagagttattaattaaaggtgaaaatgtgatgaatggttatttatatccc 

50 aaatatttaaaagacacatttgataatgatgggtattttcaaactggagatatagctgaa 
atagatgatgaaggttacgtcataatatatgatcggcgcaaagatttgattataagtggt 
ggagagaatatttatccttaccaaattgaaacaatcgcaaaagactttgaaggcattgaa 
gatgccgtatgtgtaggaatatcagatgatacttggggtcaagtaccaatattatattat 
gtgacaaatcaagatattaatcaaactgaattaatagaacattttgagaatcatttagct 

55 agatataaaattcctaaaaaatattatcaggtcaaatctttaccttatacatcgacaggt 
aaattacaacgtaaaaaggtcaaaagtgaagacttgaatgagggaaagaataatgaaagt 
taa 

Sequence 2774 
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MINTRLTRHEMINQMNSVDIATIVHTLPLELEGFNLYHFNDLTQLDKHDVSGYKFNLESI 
ASIMFTSGTTGPQKAVPQTFNNHLASAKGCKQSLGFEQNTVWLSVLPIYHISGLSVILRA 
VIEGFTVRLVKKFQTDDMLTQIKTYPITHMSLVPQTLKWLMDAGLTQPFSLEKILLGGAK 
LSPQLIEQALTYRLPVYNSFGMTETCSQFLTASPQMLKERFDTVGKPSENVEVKIKNPNA 
5 YGHGELLIKGENVMNGYLYPKYLKDTFDNDGYFQTGDIAEIDDEGYVIIYDRRKDLIISG 
GENIYPYQIETIAKDFEGIEDAVCVGISDDTWGQVPILYYVTNQDINQTELIEHFENHLA 
RYKIPKKYYQVKSLPYTSTGKLQRKKVKSEDLNEGKNNES* 

Sequence 2775 
10 Contig_08 1 9_pos_67 9 1_0 , 

is similar to (with p-value 6.0e-43) 

>gp:gp|U51132|SAU51132_2 Staphylococcus aureus o~succinyiben 
zoic acid CoA ligase (raene) , and o--succinylbenzoic acid synt 
hetase (menc) genes, complete cds. NID: gl255258. 

15 gtgaagttaacgcatcgagaaagtttgtttacggaaatagtaacttatagtggagaaact 
tattatggagaatgtaatgcatttttaactaattggtatgataaagaaacaatactcaca 
gtcgtaaacagattaagacagtggataccgcaagtacttcataaagatatgacatctttt 
gattcatggttaccttacctaaatcaaatgaatgatgcgccagctgctagatcaatggtt 
gtcatggctgtttatcaaatgtataacgacttgcatgattttgaagtacaatacggtgct 

20 acagttagtggcttaactaatagtcaaattgaaacattattagaaacaagaccgaaacgt 
ataaaacttaaatggtcaacatcactcatcaaagatcttgaaactatacgtttattaaat 
tttgattgtgatattgctatagatgcaaatgaatcattaacaaagccatcatttttacaa 
ttagccaacgtaaatacatcagatattatatatattgaagaaccttttaaaattct 

25 Sequence 2776 

VKLTHRESLFTEIVTYSGETYYGECNAFLTNWYDKETILTVVNRLRQWIPQVLHKDMTSF 
DSWLPYLNQMNDAPAARSMVVMAVYQMYNDLHDFEVQYGATVSGLTNSQIETLLETRPKR 
IKLKWSTSLIKDLETIRLLNFDCDIAIDANESLTKPSFLQLANVNTSDIIYIEEPFKIL 

30 Sequence 2777 

Contig_0819_pos_4059_3394, 

putative peptide of unknown function 

atgcaacaagaaacgacatcatggtacaaacaagaatggtttatagttttatcactttta 
ttcatttttccactaggtttatttctcatgtggaaatttagcaagtggccatctattgca 

35 agaacaatcattactgttgcaatttcagttatcgtattagcaagcattacctattatggt 
aatctacaaatgattgtaccagcaacatcaaattcaaataacgaaactaaagaaactaca 
gagaataatgtaaatgataaagacgagcgaaatcataaaactgcagtagaagaaacaaaa 
actaattatgactccaccaaagaaaatactaaagaacctggaaaagaaaatgaatctgca 
acacgattggagaactctgcgcttgaaaaggcaaagtcatattatgatgattttcacatg 

40 tctaaactaggaatttatgatattttaacatctgaatatggagaaaaatttgataaagaa 
gatgcacaatatgctatagatcatctagaggctgattatgaaaagaatgcacttgagaaa 
gcaaaatcatatgccaaagatatgcatatgtctaatgactcaatttacgatcttttggtg 
tctaactacggtgaaaaatttacagaatcagaagcaaaatatgctattgagcatttggat 
aattaa 

45 

Sequence 2778 

MQQETTSWYKQEWFIVLSLLFIFPLGLFLMWKFSKWPSIARTIITVAISVIVLASITYYG 
NLQMIVPATSNSNNETKETTENNVNDKDERNHKTAVEETKTNYDSTKENTKEPGKENESA 
TRLENSALEKAKSYYDDFHMSKLGIYDILTSEYGEKFDKEDAQYAIDHLEADYEKNALEK 
50 AKSYAKDMHMSNDSIYDLLVSNYGEKFTESEAKYAIEHLDN* 

Sequence 2779 
Con t ig_0 8 1 9_pos_2 4 1 7_1 683, 
putative peptide of unknown function 
55 atgagaaaattatttctatcaattttatctattataataatttcaagtttttgtgttgca 
acagggtttcaaaacgttaatgctgcaaataatgaggtgagcaaacctcaaagtaatgtg 
gatagtaaaactaaacaaaatattattaaaaaaataaaaaaatctaatgcttataaaaag 
cacgctaattcaacttctattgactcaataaaagatgacgatattattgttcatttggat 
aaaggtaaaaacaccaatgtatattctattaactatgtatttggtaaaaaattagctaaa 
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atgaatgataaccttgcaatgattgaattaaaatataacgaagcaaacaatgaagttttc 
tacaatcatatgatgtattcaaaatttgttaaatataataataaagaatatattaatatg 
aaaggcattttaaatggcaaaccttattatgaatttaatatagatcagaaaggtcf ttac 
tacgataaaaactttaaacatacctctaaagatgaaatcgaaaaagattctgctaaaaac 
5 ttaccacctaaagagcgtggttggtgtgaatgggcagtaggagctttatgtggtaccggt 
ggagctgcaggttgttgggcaactgctacagctttaggtattactactggttggggaggc 
ttttcattagctacaatttgtggtctgataagctctctaggctgtactggtgcaaccaac 
tatatttgtaaataa 

10 Sequence 2780 

MRKLFLSILSIIIISSFCVATGFQNVNAANNEVSKPQSNVDSKTKQNIIKKIKKSNAYKK 
HANSTSIDSIKDDDIIVHLDKGKNTNVYSINYVFGKKLAKMNDNLAMIELKYNEANNEVF 
YNHMMYSKFVKYNNKEYINMKGILNGKPYYEFNIDQKGHYYDKNFKHTSKDEIEKDSAKN 
LPPKERGWCEWAVGALCGTGGAAGCWATATALGITTGWGGFSLATICGLISSLGCTGATN 

15 YICK* 

Sequence 2781 

Contig_0822_pos_6711_5857, 

is similar to (with p-value 2.0e-42) 

20 >gp:gp| D78193 |BACGNTZA_33 Bacillus subtilis 36kb sequence be 
tween gntZ and trnY genes encoding 34 ORFs. NID: gl064780. > 
gp:gp! Z99124 IBSUB0021145 Bacillus subtilis complete genome 
(section 21 of 21): from 3999281 to 4214814. NID: g2636<i42. 
atgaagtggcttaaacaactacaatcccttcacacgaaactcgttattgtttatgtacta 

25 ctcattattattggtatgcaaatcatcggtttgtattttacgaatagtttagaaaaggaa 
ttactcgataacttcaagaagaacataacacaatatgcgaagcaattagacgtcaatatt 
gaaaaggtttataaagataaagataaaggttcagtcaacgctcaaaaggatatccaagac 
cttttgaatgaatatgcgaatcgccaagaaataggagaaatacgctttattgataaagac 
caaattatcatggcaacaaccaagcagtctaaccgtggtcttatcaatcaaaaggttaac 

30 gacggttcagttcaaaaggcgctctccttagggcaaacgaatgatcatatggttcttaag 
gattacggaagtggtaaagagcgtgtttgggtatataatataccggttaaagttgataaa 
cagacaatcggtgatatatacatagaatcgaaaattaatgatgtatacaatcagctgaac 
aacattaatcagatattcatcgtagggacagcgatatcactattcattacagtaatacta 
ggattcttcattgcacgaacgattactaagccgataaccgatatgcgtaaccaaaccgtt 

35 gagatgtctaaaggtaactacacgcaacgagtgaagatatacggtaacgatgaaatcggt 
gagctcgcacttgccttcaataacttatcgaaacgtgtccaagaagcacaagcgaataca 
gaaagtgagaaacgtcgcctagattctgttatcacacataattacttgttgaatgtatta 
cattttgattgttaa 

40 Sequence 2782 

MKWLKQLQSLHTKLVIVYVLLIIIGMQIIGLYFTNSLEKELLDNFKKNITQYAKQLDVNI 
EK\/YKDKDKGSVNAQKDIQDLLNEYANRQEIGEIRFIDKDQIIMATTKQSNRGLINr)KVN 
DGSVQKALSLGQTNDHMVLKDYGSGKERVWVYNIPVKVDKQTIGDIYIESKINDVYNQLN 
NINQIFIVGTAISLFITVILGFFIARTITKPITDMRNQTVEMSKGNYTQRVKIYGNDEIG 

45 ELALA FNNLS KRVQEAQANTES EKRRLDS V I THN YLLNVLH FDC * 

Sequence 2783 

Contig_0824_pos_5618_5929, 

putative peptide of unknown function 

50 atgcacataaattcaaaattagctaaaatgccgcgtagaattgatggcaagttattctcg 
gttagtcatacatcagtaagtaaagcgtttagaaaagcaaaagaagtgataggattaaac 
gataataatataactccctattcactcagacatacgcacacatcttacttactatctaaa 
ggcataccaatcgagtatataagtaaacgtttaggtcacgctactatatcacaaacgtta 
gacacgtattcacatttattagaagaacataaaaaagagcaaggccaacgtgtcagagaa 

55 atattctcttga 

Sequence 2784 

MHINSKLAKMPRRIDGKLFSVSHTSVSKAFRKAKEVIGLNDNNITPYSLRHTHTSYLLSK 
GIPIEYISKRLGHATISQTLDTYSHLLEEHKKEQGQRVREIFS* 
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Sequence 2785 
Contig_0824_pos_4770_3757, 
putative peptide of unknown function 
5 atggaacgattttgttgtgtaaatcaaattaactatattcaaatgaatccgttagaagcc 
aaatttaaaacgagcgctctaagatcatggaaaactgatcaggcagatgctcataagctt 
gcttgtttaggaccgacgcttaaacaaacagacaacttacctatacatgagttaatattc 
tttgaattaagagaacgcgtccgttttcatctagaaatcgagaatgaacaaaatcgactt 
aaatttcagatccttgaattactccatcaaacattccctggtttagaaagattatttagt 

10 agtcgatattcaatcattgcactcaacatcgcagaaatctttactcatccagacatggtt 
cttgatatcgacaaggaggtactgattacacatatattcaattctacagataagggaatg 
tcaatggataaagctacaaaatatgcacttcaattaagggtgat tgctcaagaaagctat 
cctaatgtcgatagacattcctttctagtcgaaaaattacgct tact tat tcaacaatta 
aaijc.iat.-rtattcatcatctcaaacaattagatgatgccatgattcaattagcacaacaa 

15 ctcgattattttgaaaatattcattcgatacctggtattggtaagctaagcacagctatg 
attattggggagattggtgatattaagcgatttaaatcaaataaacaactcaatgctttt 
gttggcattgatatcaaacgatatcaatcaggtcatacacactgtagagataccatcaac 
aagcgtggtaataaaaaagcgagaaaacttttattttgggtgattatgaatataataaga 
gggcagcatcattatgacaa teat g teg tcgattattactacaaactaagaaagcagcct 

20 aatgagaaacctcataagactgccatcattgcttgtataaatcgattattaaaaacaatt 
cattatcttgtaatgaatcataaattgtacgattatcaaatgtcaccacattag 

Sequence 2786 

MERFCCVNQINYIQMNPLEAKFKT5ALRSWKTDQADAHKLACLGPTLKQTDNLPIHELIF 
25 FELRERVRFHLEIENEQNRLKFQILELLHQTFPGLERLFSSRYSlIALNIAEIfTHPDMV 
LDIDKEVLITHIFNSTDKGMSMDKATKYALQLRVIAQESYPNVDRHSFLVEKLRLLIQQL 
KQSIHHLKQLDDAMIQLAQQLDYFENIHSIPGIGKLSTAMIIGEIGDIKRFKSNKQLNAF 
VGIDIKRYQSGHTHCRDTINKRGNKKARKLLFWVIMNIIRGQHHYDNHVVDYYYKLRKQP 
NEKPHKTAIIACINRLLKTIHYLVMNHKLYDYQMSPH* 

30 

Sequence 2787 

Contig_0824_pos_2942_24 4 2, 

is similar to (with p-value 7.0e-38) 

>gp :9P' ^^^082668 I AF082668_1 Streptococcus pyogenes CsrR icsrR 
35 ) and CsrS (csrS) genes, complete cds. NID: g3599370. 

atgcttccaaacataaatggtctagaaatttgtagacaaattcgtcaaaaaacaactact 
ccaattatcatcattactgcaaaaagcgagacatatgataaagtagctgggttggactat 
ggggcagatgactacattgtaaaaccctttgatatagaagaattgctcgcaagaataaga 
gcggtattgcgcagacagccagataaagatgttttagatatcaatggtattatcattgat 
40 aaagatgcctttaaagttactgttaatggccatcaattagaattaactaaaacagaatac 
gatttattatatgttttagctgaaaatcgtaaccacgtcatgcagcgtgaacaaattctc 
gatcacgtatgggggtataatagtgaagtagaaacgaatgtcgttgatgtttacattcgt 
tatttacgtaataaactcaaaccttttaataaagaaaaatccatagaaacagtacgtggc 
gtagggtatgtgattcgatga 

45 

Sequence 2788 

MLPNINGLEICRQIRQKTTTPIIIITAKSETYDKVAGLDYGADDYIVKPFDIEELLARIR 
AVLRRQPDKDVLDINGIIIDKDAFKVTVNGHQLELTKTEYDLLYVLAENRNHVMQREQIL 
DHVWGYNSEVETNWDVYIRYLRNKLKPFNKEKSIETVRGVGYVIR* 

50 

Sequence 2789 

Cont ig_0 824 _pos_l 9 2 9_1 07 5 , 

is similar to (with p-value 2.0e-4 2) 

>gp : gp t U81166 I LLU81166_1 Lactococcus lactis subsp. crem^^^ris 
55 MG1363 histidine kinase (llkinA) gene, complete cds. NID: g2 
182834. 

gtgagttatatcttttcttcgcaaattactaaaccgatagttacaatgtccaataaaatg 
aatcaaattagaagagatggttttcaaaataaacttgaattaactacaaattatgaagaa 
acagataatttaattgatacttttaatgaaatgatgtatcaaatagaagaatcttttaat 
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cagcaacgtcaatttgtcgaggatgcttcacacgaattaagaacgccactgcagattatt 
caaggtcatctaaatttaatccaacgttgggggaaaaaagatccagcagttttggaagaa 
tctttgaatatttcaattgaagaagtgaatcgaataacaaaacttgtcgaagaactactt 
ttacttaccaaagatagagtcaatcataatgttttggaatgtgaaaatgtagacgtaaat 
5 agcgagattcaatcacgtgtgaagtcactgcaacacctacatccagattatacttttgaa 
acacatcttgctactaagcctatccaattaaaaattaaccgtcatcagtttgaacaactc 
ttactcatatttattgataatgcaatgaaatacgacactgaacataagcacattaaaatt 
gttactcaactaaaaaataaaatgattatgattgatattactgatcatggtatgggtata 
ccaaaagctgacttagaatttatctttgatagattttatcgtgtagataaatcacgtgct 
10 cgtagtcaaggaggcaatggattaggactatcaatagcagaaaaaattgtgcaacttaac 
ggtggtatgattcaagtagaaagtgaactacaaaagtacacgactttcaaaatcagtttt 
ccagtactaaactaa 

Sequence 2790 

15 VSYIFSSQITKPIVTMSNKMNQIRRDGFQNKLELTTNYEETDNLIDTFNEMMYQIEESFN 
QQRQFVEDASHELRTPLQIIQGHLNLIQRWGKKDPAVLEESLNISIEEVNRITKLVEELL 
LLTKDRVNHNVLECENVDVNSEIQSRVKSLQHLHPDYTFETHLATKPIQLKINRHQFEQL 
LLIFIDNAMKYDTEHKHIKIVTQLKNKMIMIDITDHGMGIPKADLEFIFDRFYRVDKSRA 
RSQGGNGLGLSIAEKIVQLNGGMIQVESELQKYTTFKISFPVLN* 

20 

Sequence 2791 

Contig_0824_pos_0_597, 

is similar to (with p-value 8.0e-29) 

>pir:pir I S25295 I A32879 oxoglutarate dehydrogenase (lipoamide 

25 ) (EC 1.2.4.2) ~ Bacillus subtilis 

atgctagatttgtatgatgattatctacaaaatccatcatccgtacctgaagatttacaa 
gtcttgttcagtacaattaaaacaggtgaagctcatatcgaagctaaacctaccactgat 
gggggtggttcacaagcgggagatagcacaattaaacgtgttatgcgcttaatcgataat 
attcgtcaatacggacatttaaaagcagatatttatccagtaaatcctccagagcgtcaa 

30 aatgttcctaaattggaaatcgaagattttgatttagataaagaaactttggaaaaaata 
tcatctggaattgtctctgaacattttaaagacatttatgacaatgcctatgatgcaatt 
gttcgtatggaaagacgttacaaaggaccgatagcttttgaatacactcacattaataat 
aataaagaacgtgtgtggttaaaaagaagaattgaaacgccttataaagcaagtttaaac 
gataatcaaaaaaaagaacttttcaaaaaactcgcacacgtagaaggttttgaaaaatat 

35 ttgcacaaaaattttgttggggctaaacgtttttcaattgaaggcgtcgaTC 

Sequence 2792 

MLDLYDDYLQNPSSVPEDLQVLFSTIKTGEAHIEAKPTTDGGGSQAGDSTIKRVMRLIDN 
IRQYGHLKADIYPVNPPERQNVPKLEIEDFDLDKETLEKISSGIVSEHFKDIYDNAYDAI 
40 VRMERRYKGPIAFEYTHINNNKERVWLKRRIETPYKASLNDNQKKELFKKLAHVEGFEKY 
LH KN FVGAKR FS I EG VDX 



Sequence 2793 

Contig_044 0jpos_5821_6999 
45 is similar to (with p-value 2-0e-24) 

>sp:sp| P23524 I YHAD_ECOLI HYPOTHETICAL 42.1 KD PROTEIN IN RN 

PB-SOHA INTERGENIC REGION (ORF 3) (F408) . >pir : pir I JQ061 4 [ JQ 

0614 hypothetical 42K protein - Escherichia coli >gp:gp|D902 

12 IEC0RNPBW_3 E.coli rnpB gene and ORFs. NID: g216630. >gp:g 
50 PIU18997 |ECOUW67_54 Escherichia coli K-12 chromosomal region 
from 67.4 to 76.0 minutes. NID: g606010. >gp:gp | AE000394 | AE 

0003942 Escherichia coli K-12 MG1655 section 284 of 400 of 

the complete genome. NID: g2367197. 

atgtttaaaa taat ttttggaaaagagaaaaataaggtggttaagacaatgaaagtttta 
55 gtagccatggatgaatttaatggaattatttctagttaccaagctaatagatatgttgaa 
gaagcggtagcaagtcaaattgaagatgcagatatcgttcaagttccactatttaacggt 
cgtcacgaattattagattcagtctttctttggcaatcaggaaataaatatcgtgtgagt 
gcgcatgatgctgacatgaaagaaaccgaagcaatatatggacaaacggatagtggtatg 
actattatcgaaggtcacttatttttaaatggcaaaaaacctattcaacatcgatcaagt 
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tacggtttgggagaggttataaaagcagcattggacaatcatacagaacatcttgttatt 
tctttaggtggaataggaagttttgatggcggtgcaggcatgttgcaagcattgggtgca 
acattttatgatgatgaagcacaaattgtcgatatgaggaaaggtgcatatttaataaaa 
tatattagacgtattgatttatcaggtgttcatccacaattaacaaaggtaaacattcaa 
5 ttaatgtcagatt tctcaagtcgattgtatgggaaaaaaagtgaaatcatgcaaacatac 
gaatcattagatttgtctcaaaatgaagcagccgagatagataatttaatttggtatttt 
agtgaattatttaagaatgaattgaaaatagcaatgggaccaatcgagcgcggtggtgct 
ggaggtggtatagcagctgtattaaatagtctctatcaagctgagattttaacaagccat 
gaattagtgaatcaaatcacacatttagaaaacttaattcaacaggcagatcttattatt 
10 ttcggagaaggtttgaaagaagaagatcaaattctagagactacaacaatacgtatagca 
gaacttacccaacaatacagcaagccagctattgcaatttgtgctacaaatgataaattt 
gatttgtttgaatcattgaatgttacagcaatgtttaatacatttattgatatgcctgat 
tcatatacagattttaagatgggtattcaaatcagacattacacagtacaagcactaaaa 
ttattgaaaacgcaaataaacttaccgctttcatcctaa 

15 

Sequence 2794 

MFKIirGKEKNKVVKTMKVLVAMDEFNGIISSYQANRYVEEAVASQIEDADIVQV^LFNG 
RHELliDSVFLWQSGNKYRVSAHDADMKETEAIYGQTDSGMTIIEGHLFLNGKKPIQHRSS 
YGLGEVIKAALDNHTEHLVISLGGIGSFDGGAGMLQALGATFYDDEAQIVDMRKGAYLIK 
20 YIRRIDLSGVHPQLTKVNIQLMSDFSSRLYGKKSEIMQTYESLDLSQNEAAEIDNLIWYF 
SELFKNELKIAI^GPIERGGAGGGIAAVLNSLYQAEILTSHELVNQITHLENLIQQADLII 
FGEGLKEEDQILETTTIRIAELTQQYSKPAIAICATNDKFDLFESLNVTAMFNTFIDMPD 
SYTDFECMGIQIRHYTVQALKLLKTQINLPLSS* 

25 Sequence 27 95 

Con t i g_0 4 4 0_pos_4 8 3 1_4 4 8 7 

is similar to (with p-value l.Oe-60) 
>pir : pir I 1 67760 ( 1677 60 transposase (insertion sequence ISIO 

) - Escherichia coli >gp ; gp | 367 119 i S67119_2 BST=somatotropin 
30 . . . BST/beta-Gal fusion protein [Escherichia coli, LBB84, pla 

smid pXT107, ISlOL/R-1, PlasmidTransposonlnsertionMutant , 3 

genes, 1679 nt] . NID: g455674. 

atgcagattgaagaaaccttccgagacttgaaaagtcctgcctacggactaggcctacgc 
catagccgaacgagcagctcagagcgttttgatatcatgctgctaatcgccctgatgctt 
35 caactaacatgttggcttgcgggcgttcatgctcagaaacaaggttgggacaagcacttc 
caggctaacacagtcagaaatcgaaacgtactctcaacagttcgcttaggcatggaagtt 
ttgci::gcattctggctacacaataacaagggaagacttactcgtggctgcaaccctacta 
gctcaaaatttattcacacatggttacgctttggggaaattatga 

40 Sequence 2796 

MQIEETFRDLKSPAYGLGLRHSRTSSSERFDIMLLIALMLQLTCWLAGVHAQKQGWDKHF 
QANTVRNRNVLSTVRLGMEVLRHSGYTITREDLLVAATLLAQNLFTHGYALGKL* 

Sequence 2797 

45 Cont ig_0 4 4 l_pos_2 5 5 9_3 4 67 

is similar to (with p-value 2.0e-39) 

>sp:sp|054983 |CRYM_MOUSE MU-CRYSTALLIN HOMOLOG . >gp:gp|AF03 
9391 1 AF039391_1 Mus musculus mu-crystallin (Crym) mRNA, comp 
lete cds. NID: g2745895. 

50 atgaatgaagtcattttagaagtagaaaaagctttgcaagctttttcagagaataagacg 
ataacgccattaagatatgttttgccatttaatgagcaaaatcgttatttagtgatgcca 
gcattatcagatgaattaaatatcgttggacttaaaacagtctcatttgcacctgaaaat 
tcaaaaaaagggaaagcgactattactggatcagttattttaagtgactatgaaacagga 
gaaacattgtctatattagatggtggttttcttactaaagtaagaactggtgcaatt tea 

55 ggtgtagctactaaatatctagcaaaagaaaacgctaaaacacttagtgtaataggggca 
ggt.-jtacaagctgaaggtttaattgaagcgatacttgctgttagagatattgaaa^/.aatt 
cacatcgcaagtagaacgttcgaaaaagcagaaaaatttgctcaaaatatacgaaatcga 
tttaatattaaagtgagtgtatttagatcggcagatgaagcgatagacagtgcagatatt 
gtagttacagcaacaaatgcaaatcagcccgtttatactcattctttacatccaggcgtg 
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catttaaatgcagtcggatcctttaaaccagatatgcaagaaataccttcagaaacaatg 
cttgttgctaataaaattgttgttgaatctatggaagcagctttagaagaaacaggtgat 
ttaaaaattcctcaagcagaaggaatattaactaaaaatatgctacatagtgaattaggc 
gacattatttctggtgaaaaagttggccgcgaaactgaagaagaagtgacagtctttaaa 
5 tcggtcggtctagcaattgtagatatcattgtggcacaatatttttataaaaaattaata 
caatcttag 

Sequence 2798 

MNEV:Lb-VEKALQAFSENKTITPLRYVLPFNEQNRYLVMPALSDELNIVGLKTVSi"APEN 
10 SKKGECATITGSVILSDYETGETLSILDGGFLTKVRTGAISGVATKYLAKENAKTLSVIGA 
GVQAEGLIEAILAVRDIEKIHIASRTFEKAEKFAQNIRNRFNIKVSVFRSADEAIDSADI 
VVTATNANQPV YTH SLH PGVH LNAVGS FK PDMQE I PS ETML VANKI WE SMEAALEETG D 
LKIPQAEGILTKNMLHSELGDIISGEKVGRETEEEVTVFKSVGLAIVDIIVAQYFYKKLI 
QS* 

15 

Sequence 2799 
Cont ig_04 4 l_pos_4 023_4 97 9 
unknown 

atgaaaaaaataataatcactggagcacttggtcaaattggtactgaattagttattaag 
20 tgtagagaaagatatggaactgaaaatgtattagctactgatataagaaagccagaacca 
cactctccagttaaaaacggtccttttgaaatcttagatgttacagatagaaatcgttta 
tttgaaacagtgagatatttcaatgctgatacacttatgcatatggctgcactgctttct 
gctacagcggaaaagaaaccactagttgcctgggatttgaatatggggggacttataaat 
acactcgaagcagcaagaaggtatcaattaaaatattttacgccaagttcaataggagct 
25 tttggtatttcgacacctaaagtaaatacacctcaattaacaatacaacaaccaacgact 
atgtatggtataaacaaagtaactggggaattattatgtcagtattattatgtcaaattt 
ggagtagatacacgtagcgttagatttcctggtttaatttcacatgttaaagaacctggc 
ggi.gc,'cataacagattatgcagttgatatgtattttaaggcagttagaaaaggac.-iutat 
acaagttatattaatcgttacacttatatggatatgatgtacatggaagatgctaltgat 
30 gcgattatcaaattaatggaagaagacagtgttaaattaaaaacaagaaatggctataac 
ttaagtgcgatgagtattgaacctgaaatgctaaaacaagctatacaagtatactatcca 
gatttcaccttagattatgacattgatttagagcgtcaagatattgcattaagttggcca 
gatagtatagatacgagctgtgctcaggaagagtggggatttgatcctaaatatgattta 
ccaactatgactaaagtgatgttagaggcaattgagaaaaaacaaaaagaatgctga 

35 

Sequence 2800 

MKKIIITGALGQIGTELVIKCRERYGTENVLATDIRKPEPHSPVKNGPFEILDVTDRNRL 
FETVRYFNADTLMHMAALLSATAEKKPLVAWDLNMGGLINTLEAARRYQLKYFTPSSIGA 
FGISTPKVNTPQLTIQQPTTMYGINKVTGELLCQYYYVKFGVDTRSVRFPGLISHVKEPG 
40 GGITDYAVDMYFKAVRKGHYTSYINRYTYMDMMYMEDAIDAIIKLMEEDSVKLKTRNGYN 
LSAMSIEPEMLKQAIQVYYPDFTLDYDIDLERQDIALSWPDSIDTSCAQEEWGFDPKYDL 
PTMTKVMLEAIEKKQKEC* 

Sequence 2801 
45 Contig_0442_pos_1526_2359 

is similar to (with p-value l.Oe-60) 
>pir :pir 1 167760 1 167760 transposase (insertion sequence ISIO 

) - Escherichia coli >gp : gp | S67 119 | S6711 9_2 BST=somatotropin 

. . .BST/beta-Gal fusion protein [Escherichia coli, LBB84, pla 
50 smid pXT107, ISlOL/R-1, PlasmidTransposonlnsertionMutant , 3 

genes, 1679 nt) . NID: g455674. 

atgaagagtattttatttttaagtaataatgtgaaaatattcacaaaaaaactaggagga 
tttgctatgagtaaggaaatattcgatacttttaaatttaaatgtggtgccgaattaaaa 
aatagagtattaatggcacccatgactatccaagctgggtattttgatggaagtgttaca 
55 tcagaaatgattgattattatcaatttagagctggtgatgcttcagcaatcattgttgaa 
agttgttttgttgaaaatcacggacgaggatttccgggagctataggtattgataatgat 
gacaaaatacctggactcaaacgtttagcagaagcgattcaagctaagggatcaaaagcg 
attttgcaactttatcatgccggaagaatggcaaatcctaaatttaatgaaggagagcag 
ccgatatctgcgagccccattgcagcattaagacctgatgctgtaccacctagagaaatg 



684 



wo 01/34809 



PCT/USOO/30782 



acacatyctcaaatcaatcagatgattgatgactttggagaggctacacgtcgcgctata 
gaagcggggtttgatggtgtcgaaattcatggcgccaacacatacttattacaacaattt 
ttctctccacattctaatcggagacaagattcatggggaggcagtcgtgaaaaacgtaca 
cgatttccaatcgaagttttgacaaaggttcaacacgtcgttgctgaaaaagaggcttct 
5 cattttattataggatatcgattctcacctgaagaaattgaagaaccaggcatacgtttt 
gaagataccatgtttttactaaatacattagcagaatgcttgactacaaaatag 

Sequence 2802 

MKSILFLSNNVKIFTKKLGGFAMSKEIFDTFKFKCGAELKNRVLbdAPMTIQAGYFDGSVT 
10 SEMIDYYQFRAGDASAIIVESCFVENHGRGFPGAIGIDNDDKIPGLKRLAEAIQAKGSKA 
ILQLYHAGRMANPKFNEGEQPISASPIAALRPDAVPPREMTHAQINQMIDDFGEATRRAI 
EAGFDGVEIHGANTYLLQQFFSPHSNRRQDSWGGSREKRTRFPIEVLTKVQHWAEKEAS 
HFIIGYRFSPEEIEEPGIRFEDTMFLLNTLAECLTTK* 

15 Sequence 2803 

Contig_04 4 2_pos_l 31 17_134 61 

is similar to (with p-value 5.0e-53) 

>sp:sp|P54 524 I YQIG_BACSU PROBABLE NADH- DEPENDENT FLAVIN OXI 
DOREDUCTASE YQIG (EC 1.-.-.-). >gp : gp I D84 4 32 | BACJH642_230 Ba 
20 ciili;.s aubtilis DNA, 283 Kb region containing skin element. 
NXD: g2627063. >gp: gp I Z99116 | BSUB0013_132 Bacillus subtilis 
complete genome (section 13 of 21): from 2395261 to 2613730. 
NID: g2634723. 

atgcagattgaagaaaccttccgagacttgaaaagtcctgcctacggactaggcctacgc 
25 catagccgaacgagcagctcagagcgttttgatatcatgctgctaatcgccctgatgctt 
caactaacatgttggcttgcgggcgttcatgctcagaaacaaggttgggacaagcacttc 
caggctaacacagtcagaaatcgaaacgtactctcaacagttcgcttaggcatggaagtt 
ttgcggcattctggctacacaataacaagggaagacttactcgtggctgcaaccctacta 
gctcaaaatttattcacacatggttacgctttggggaaattatga 

30 

Sequence 2804 

MQIEETFRDLKSPAYGLGLRHSRTSSSERFDIMLLIALMLQLTCWLAGVHAQKQGWDKHF 
QANTVRNRNVLSTVRLGMEVLRHSGYTITREDLLVAATLLAQNLFTHGYALGKL* 

35 Sequence 2805 

Contig_04 45_pos_0_322 

is similar to (with p-value 8.0e-17) 

>gp:gp|M89774 IEC0LYSP_2 Escherichia coli lysine specific pe 
rmf^ase (lysP) gene, complete cds . NID: g46677 6. >gp:gp| ACOOO 
40 305 1 AE0U0305_1 Escherichia coli K-12 MG1655 section 195 of 4 
00 of the complete genome. NID: gl788479. 

gtggcaactggaagtgtcatttctcaagctggcccaggaggagctatattagcttatata 
ctaattggtattatgctttattttttaatgtcatcaataggagaattagcaactttctat 
ccggtttctggttcttttagttcatactctaccagatttgttgattcgtcacttggtttt 
45 acaatgggttggttgtattggggtatgtggtcacttgtaacaagtgtagatatcattgtt 
gcttccaatgtattacaatattgggatgtatttaaagttttaaatccacttacatggagc 
ttaattttcttaactctgttgt 

Sequence 2806 

50 VATGSVISQAGPGGAILAYILIGIMLYFLMSSIGELATFYPVSGSFSSYSTRFVDSSLGF 
TMGWLYWGMWSLVTSVDIIVASNVLQYWDVFKVLNPLTWSLIFLTLLX 

Sequence 2807 

Contig_04 4 7_pos_14098_15573 
55 is similar to (with p-value 6.0e-40) 

>gp :gp 1X93084 I MBFMDSUBS_1 M.barkeri fmdE, fmdF, fmdA, fmdC, 
fmdD, fmdB, orf4, orf3, orf2, and orfl genes. NID: gll24956 

atga.'i-aaoattgattactcttattgtcatgatatcttttgttttagcgagctgtgrfgggc 
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acatcaagtacagacaaagacaccctcaatgttgaaatacctttgaaaactaaatcaatt 
gcaccttacgaaactgatatcccagttaaaacaggtgccttggaatcgctttttaaaatg 
tcgaagaatggtaaagtaaaacctttattagtcaaaaattatcatcaagtatctgacaat 
caactagaactcactttaaaagataatattaaatttcaaaacggtcatcatttaacaggc 
5 gaagctgtaaaacgtagtctcgaagaaggaatgaaaaaaagtgatttgttaaaaggatca 
cttcctattaaatcaatcaatgctcatggacaaaaagtcacaatcactactaaagaacct 
tatccagaattaatgtctgaactcgcaagcccatttgctgctatttacgacacaaaagct 
aaaaacaaagtaactgatcaacctgttggtacgggtccttataaaattgatcagtataaa 
cgttcgcaaaaaatcgtactaaaacaattcaaagactactggcaaggtacgccaaaatta 

10 aaaagaattaatgtcacttatcatgaagatggtaatactcgtgttgatcacttattat ca 
ggcaaatcagatttgactactgatgttccaattgaacgcgttgatgatgtaaaaaaatct 
aacaaagcaaacattcaaagtacatcaggctttagaacgcatttaatgttatacaatcat 
gatagtaaaaaagttaataaaaaagtaagagaagcactagatatgattattaatcgaaaa 
gacattgctaaaaatgtttctaaaaattatgctgagccagcatcaggtccttttaaccat 

15 cgattaaaatcattagaaaaagaggaaattcaatcacaagacatcaagcgtgcaaaagaa 
cttttagctcaagaaggttattccaaatcgcatcctcttaaattaaacatggtcacatac 
gatggcagaccagaat tgcctaaaattggacaggtgatacaatctgaagctaaaaaggca 
aatgttgatatacaat tacgcaatgtagatgatatcgaaggatatctcaaaaacaaacag 
agttgggatgtttcaatgtatagttatttaagtgtgccacgtggtgatacaggttatttc 

20 tttaacactgcatacttacctgatggagcattaaataaaggtaattatagtaatactaaa 
gtcactcagttaattaaagaattaaatactactttcggtgacaagcaacgtggtcaagtg 
actaatgaaatactaaatgaatctaaaaaagatattcctaacagctatatcacatacaac 
tctcaaatagatggtgtgaataataaagtaagacattttaatgttacaccagaatctatc 
tatttaattgattataaattaagtaaaaaagaataa 

25 

Sequence 2808 

mkklitlivmisrvlascggtsstdkdtlnveiplktksiapyetdipvktgaleslfkm 
skngkvkpllvknyhqvsdnqleltlkdnikfqnghhltgeavkrsleegmkksdllkgs 
lpiksinahgqkvtittkepypelmselaspfaaiydtkaknkvtdqpvgtgpykidqyk 

30 rsqkivlkqfkdywqgtpklkrinvtyhedgntrvdhllsgksdlttdvpiervddvkks 
nk;^.niq?7sgfrthlmlynhdskkvnkkvrealdmiinrkdiaknvsknyaepasg?fnh 
rlkslekeeiqsqdikrakellaqegyskshplklnmvtydgrpelpkigqviqseakka 
nvdiqlrnvddiegylknkqswdvsmysylsvprgdtgyffntaylpdgalnkgnysntk 
vtqlikelnttfgdkqrgqvtneilneskkdipnsyitynsqidgvnnkvrhfnvtpesi 

35 ylidyklskke* 

Sequence 2809 

Contig_0447_pos_20772_21911 
>gp:gp|Z82038 |CTZ82038_4 C . thermosaccharolyticum etfB, etfA 
40 , hbd, thlA and actA genes. NID: gl667352. >gp: gp I Z92974 I TTB 
CS0PRN_6 T. thermosaccharolyticum BCS operon DNA. NID: gl9033 
26. 

gtgtttggtggtgtatttaaggatatacctgcctatgaactaggtgcaacagttattcgt 
caaattttagaacatagtcaaatagatcctaatgaaatcaatgaagttattctaggaaac 

45 gtattacaggcaggtcaaggacaaaatcctgctcgtattgctgcgattcatggtggtgtg 
ccagaagcggtaccttcttttactgtaaataaagtttgcggttctggatt-aaaagcgatt 
caacttgcctatcaatctattgtagcgggagataatgagattgttatcgctggaggcatg 
gaaagtatgtctcaatctccaatgcttcttaaaaatagtcgtttcggttttaaaatggga 
aatcaaactttagaagatagtatgatagctgatggtttaactgataagtttaatgattac 

50 catatgggtatcacagccgaaaatctagttgaacagtatcagattagtcgtaaaga hcaa 
gatCi.\at^.tgcattcgattctcaacaaaaagcatcacgtgcacaacaagctggtg*.:attt 
gatgctgaaattgtacctgtagaggtaccacaacgtaaaggcgaccccctaattatttct 
caagatgaaggcattagacctcaaacgacaattgataagttagcacaactccgtccagca 
tttaaaaaagatggatcagtaactgctggtaatgcatccggtatcaatgacggtgctgct 

55 gctatgctcgttatgacggaggacaaagcgaaagcattgggcttacaacctatagctgta 
ttagatagttttggtgcgagtggtgtggcgccttcaattatgggtattggtccagttgaa 
gcgatacataaagctttaaaacgttctaataaagtgataaatgatgttgatatttttgaa 
ttaaacgaagcttttgcagcgcaatcaattgctgtaaaccgtgagttgcaattaccgcaa 
gataaagtcaatgttaatggtggtgcgattgcactaggacatccgataggggcttcgggt 
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gcgcgtactttagtttcattattacatcaattaagtgatgctaagccaacaggtgtggca 
tctttatgtatcggtggcggtcaaggtatcgctacggttgtatctaaatatgaagtttaa 

Sequence 2810 

5 VFGGVFKDIPAYELGATVIRQILEHSQIDPNEINEVILGNVLQAGQGQNPARIAAIHGGV 
PEAVPSFTVNKVCGSGLKAIQLAYQSIVAGDNEIVIAGGMESMSQSPMLLKNSRFGFKMG 
NQTLEDSMIADGLTDKFNDYHMGITAENLVEQYQISRKEQDQFAFDSQQKASRAQQAGVF 
DAEIVPVEVPQRKGDPLIISQDEGIRPQTTIDKLAQLRPAFKKDGSVTAGNASGINDGAA 
AMLVMTEDKAKALGLQPIAVLDSFGASGVAPSIMGIGPVEAIHKALKRSNKVINDVDIFE 
10 LNEAFAAQSIAVNRELQLPQDKVNVNGGAIALGHPIGASGARTLVSLLHQLSDAKPTGVA 
SLC r GGGQG I AT VVS K Y E V ♦ 

Sequence 2811 

Cont ig_0 4 4 7_pos_2 4 38 7_2 2894 
15 >pir:pir I S57 636I S57636 5-methyltetrahydropteroyltriglutamat 
e — homocysteine S-methyltransferase (EC 2.1.1.14) - Madagasc 
ar periwinkle >gp:gp 1X83499 1 CRMETS_1 C.roseus MetE mRNA for 
methionine synthase. NID: g886470. 

gtgggcggtttaggattagatttggtacacgataacggctataacttaaaacaaattgaa 

20 gatggtaatttcgatcaaagtaaagcactttatgcaggaatcattgatggcagaaatgta 
tgggcggctgatattgaagcaaaaaaacaattaatcgaaacattacaacaacacacacaa 
cagttagtcattcaaccttcatcatcacttctacatgttccagtatcactagatgatgaa 
acacttgatgaatctattgcagaaggattaagttttgctactgaaaagttagacgaatta 
gatgcattgcgtcgcttattcaatgataatgacctgtcaaaatatgaacattacaaagca 

25 cgttatgaacgcttccaaagccaatcatttaaaaatcttgaatacgattttgaaagtgta 
ccgacacatcgtaaatcaccatttgctaaacgcaagcaattacaaaaccaacgtctaaat 
ttaccagatttacctacgactacgattggttcattccctcaaactcgtgaagtccgtaaa 
tttagagcagattggaaaaataatcgcattacagatgctgagtatcaagaatttttacaa 
aatgaaatcgctcgttggattaaaatccaagaagatattggtttagacgtacttgtacac 

30 ggtgagtttgagcgtaacgacatggttgaattttttggggaaaaacttcaaggtttccta 
gtaacaaaattcggttgggtacagtcttacggttctcgtgcagttaaaccaccagttata 
tatggagatgtgaagtggactgccccacttactgtaaaagaaacagtatacgcgcaaagc 
ttaactgacaaaccagtaaaaggtatgcttacaggtccagttactattttaaactggtca 
tttgaacgcgtagacgtacctagaaaggttgttcaagatcaaattgccttagccattgat 

35 gaagaagtgttagctttagaagaagcaggtatcaaagtcattcaagttgacgaaccaacg 
ttgcgtgaaggcttacctctacgctctgaatatcacgaacaatatttagaagatgctgta 
cattcatttaaattagcgacatcatctgtgcatgatgaaacacaaattcatacacatatg 
tgctattctcaatttggacaaatcattcatgctatccatgatttagatgcagatgttatt 
tctattgaaacttctcgtagtcacggagacttaattcaagatttcgaagatattaattat 

40 gatttaggaattggattaggtgtttacgatattcacagtccacgtattccaaccgaagaa 
gagattactactgcaattaatcgttcattacaacaaattgatcgttctctattctgggtt 
aacccagactgtggtctaaaaacacgtaaagaaaatgaagtgaaagatgcattaacagtg 
ttagtcaatgcagttaaaaagaaacgtcaagaatctgaatcaacaacagcataa 

45 Sequence 2812 

VGGLGLDLVH DNG YNLKQI EDGN FDQS KALYAG 1 1 DGRN VWAADI EAKKQL I ETLQQHTQ 
QLVIQPSSSLLHVPVSLDDETLDESIAEGLSFATEKLDELDALRRLFNDNDLSKYEHYKA 
RYERFQSQSFKNLEYDFESVPTHRKSPFAKRKQLQNQRLNLPDLPTTTIGSFPQTREVRK 
FRADWKNNRITDAEYQEFLQNEIARWIKIQEDIGLDVLVHGEFERNDMVEFFGEKLQGFL 

50 VTKFGWVQSYGSRAVKPPVIYGDVKWTAPLTVKETVYAQSLTDKPVKGMLTGPVTILNWS 
FERVDVPRKVVQDQIALAIDEEVLALEEAGIKVIQVDEPTLREGLPLRSEYHEQYLEDAV 
HSFKLATSSVHDETQIHTHMCYSQFGQIIHAIHDLDADVISIETSRSHGDLIQDFEDINY 
DLGIGLGVYDIHSPRIPTEEEITTAINRSLQQIDRSLFWVNPDCGLKTRKENEVKDALTV 
LVNAVKKKRQESESTTA* 

55 

Sequence 2813 
Contig_04 50_pos_7 632_8510 
No hits found 

atgaaagcgccagttctggtatcaggtactgacggtgtgggtacaaagttaaaattagca 
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attgactatggaaagcatgacacaattggtattgatgctgtcgcaatgtgtgtaaatgat 
attttaacaacaggtgctgaacctttatactttttagactatattgccacgaataaagta 
gtgccaagtactatagagcaaatcgttaaaggtataagtgacggttgcgaacaaaccaat 
acggcacttataggcggtgaaactgctgaaatgggagaaatgtatcatgaaggtgaatat 
5 gatattgctggttttgcagtaggagcggtagagaaagaggactatattgatggttcaaat 
gttgaagaaggacaagcaattattggtttagcttcaagtggtattcattcaaatggctat 
agtctagttagaaaaatgataaaagaatcaggagttcaattacatgatcaatttaatggt 
caaacctttttagaaaccttccttgcaccaacaaaattgtatgtaaagcctattcttgaa 
ttaaagaaacatattgatatcaaagcgatgagccatattactggtggaggtttctatgaa 
10 aatattccgcgtgcccttcctaaaggtttatcagcaaaaatagatacacaatcattccca 
acgttggaagtctttaattggcttcaaaaacagggcaacatttcaacgaatgaaatgtat 
aacatatttaatatgggtattggatatacaattattgttgacaaaaaagatgttcaaaca 
acattaacaacgttacgtgcaatggatacaactgcatatgaaattggtgagattataaaa 
gat g^t gat acacctat teat ttattggaggtagaatag 

15 

Sequence 2814 

MKAPVLVSGTDGVGTKLKLAIDYGKHDTIGIDAVAMCVNDILTTGAEPLYFLDYIATNKV 
VPSTIEQIVKGISDGCEQTNTALIGGETAEMGEMYHEGEYDIAGFAVGAVEKEDYIDGSN 
VEEGQAIIGLASSGIHSNGYSLVRKMIKESGVQLHDQFNGQTFLETFLAPTKLYVKPILE 
20 LKKHIDIKAMSHITGGGFYENIPRALPKGLSAKIDTQSFPTLEVFNWLQKQGNISTNEMY 
NIFNMGIGYTIIVDKKDVQTTLTTLRAMDTTAYEIGEIIKDDDTPIHLLEVE* 

Sequence 2815 

Contig_04 50_pos_13563_14 528 

25 is similar to {with p-value 9.0e-27) 

>gP-gpl Z99108 j BSUB0005_100 Bacillus subtilis complete genera 
e (section 5 of 21): from 802821 to 1011250. NID: g2633055. 
>gp:gp|D78508|D78508_5 Bacillus subtilis DNA for YfiO, YfiP, 
YfiN, YfiM, YfiL, YfiK, YfiJ, Yfil, YfiH, complete cds. NID 

30 : gl817531. 

gtgggtaaaggcaacaatcttggcaaaaatgctgctctcaaaacttataatgagaatcaa 
gcgcaacatctgcttaaacagaatcaattacaaggatactttgtcttcgaccgtggtatg 
accgatactttttataaagatggtagccttcctataactatttatacatatgatgaricaa 
tcaagtaacagtgtcgtggttaatcaattaacacgctcagtttatgaccgtttaa^gtta 

35 tcaatgggcggtgtgctaagctttaatcaattagctaaagatccttcaaatgaagacgta 
gcaatgacattgattgatatgttatttaccggtttaaatcgttcaggttcatttaatttt 
gaacccatacatatttatgacaccagtagttattatgtagtcactggatttcttttgtct 
atctttatattgtgtttatcactttatacagtactaaaaatgaatcaagaaactgcactc 
aaagaacgcttgcaaatgtttcatttttcctttgaaaagctcacgatagttcgaggtatc 

40 attgcatggttttattcactcatatgggcatttattggctttatctggattactcatgct 
ctaaatgccccatttgaaaaatacaattggccaacggtagctttacaactcacttattac 
gttacgttactcgtcctatgcttattacttatagacttaattactcgttcatggataaac 
tttctactcaaattattacttagctttgttatcgttattttttctgggataattatccct 
actatcttctttaaacacatgcttaatgatgtaatcattacacaaccatttagtttggtt 

45 actaatcaaatgttagaaataacactcaataactatattttagacacacatccagcattt 
tatttaagttttattacactattgatactattcatcattgttttagtatggaggtatcgc 
cgatga 

Sequence 2816 

50 VGKGNNLGKNAALKTYNENQAQHLLKQNQLQGYFVFDRGMTDTFYKDGSLPITIYTYDEQ 
SSNSVWNQLTRSVYDRLMLSMGGVLSFNQLAKDPSNEDVAMTLIDMLFTGLNRSGSFNF 
EPI H I YDTSS Y Y VVTG FLLS I FILCLSLYTVLKMNQETALKERLQMFH FS FEKLT I VRG I 
lAWFYSLIWAFIGFIWITHALNAPFEKYNWPTVALQLTYYVTLLVLCLLLIDLITF.OWIN 
FLLKLLI:S FVI VI FSGI 1 1 PTI FFKHMLNDVI ITQPFSLVTNQMLEITLNNYILD rHPAF 

55 YLSFITLLILFIIVLVWRYRR* 

Sequence 2817 

Con t i g_0 4 5 0_po s_15460_16149 

is similar to (with p-value l.Oe-18) 
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>sp:sp|P22082|SNF2_YEAST TRANSCRIPTION REGULATORY PROTEIN S 
NF2 (SWI/SNF COMPLEX COMPONENT SNF2) (REGULATORY PROTEIN SWI 

2) (REGULATORY PROTEIN GAMl) (TRANSCRIPTION FACTOR TYE3) . >p 
ir:pir|S150471S15047 SNF2 protein - yeast (Saccharomyces cer 
5 evisiae) >gp: gp | X57837 I SCSNF2_1 Yeast GAM1/SNF2 gene for a n 
uclear protein required for transcription of STAl gene. NID: 
g4499. >gp:gp|X89633|SCVCOSMGN_20 S.cerevisiae DNA for VPHl 
, MODS, CAP20, ORFl and SNF2 genes. NID: gl279694. >gp:gp|Z7 
5198 |SCYOR290C_1 S.cerevisiae chromosome XV reading frame OR 
10 F YOR290C. NID: gl420643. >gp : gp | D904 59 I YSCRIC1_1 Yeast RICl 
gene (regulatory gene for phospholipid synthesis), complete 
cds. NID: g806531. >gp : gp I M61703 I YSCSNF2A_1 Saccharomyces c 
erevisiae SNF2 protein gene, complete cds. NID: gl72631. 
atgatagatattcaaaatgtttccaaaagctataaaaagaagcatattttcgattcccta 
15 gatatgcaatttcaaaatcataaaattactattttattaggtgaaaatggtgctggaaaa 
tctacattattgcgtttaattgcaggtattgagaatgcagacgaaggacgtattcaatac 
ttcaatcaatatttgtcaagacgtcgaatacgtcatattgtaggctatgtccctcaagac 
atcgcactattcgagcatatgactgtcatggagaacattgagtttttcaagtcactttgt 
gaaaatcctatttcagatgaaacacttcattcttatttatcacaattaaattttactgat 
20 acaaaagtgaaagtatctaacctttctgggggaaataaacgtaaagtcaatattatgata 
ggtct act tagtcggcctaaaa tact tattctagatgagccaacagaaggcattgattta 
gaatcaagatatgatattcacaacttattacaacaaatgaccgatcaatgtttaatcatc 
atgacgacacatcatttagacgaagttgaagcactagcagatgatattaaagttataggt 
caaaatcctttttatcatgatattttagaaaataaaggttggtcttttaaaaaatatgca 
25 aatgcctt agctgataatacgaaatcttaa 

Sequence 2818 

MIDIQNVSKSYKKKHIFDSLDMQFQNHKITILLGENGAGKSTLLRLIAGIENADEGRIQY 
FNQYLSRRRIRHIVGYVPQDIALFEHMTVMENIEFFKSLCENPISDETLHSYLSQLNFTD 
30 TKVKVSNLSGGNKRKVNIMIGLLSRPKILILDEPTEGIDLESRYDIHNLLQQMTDQCLII 
MTTHHLDEVEALADDIKVIGQNPFYHDILENKGWSFKKYANALADNTKS* 

Sequence 2819 

Contig_04 50_pos_2034 4_217 86 
35 is similar to (with p-value 5.0e-85) 

>sp: spl P43848 I PUR5_HAEIN PHOSPHORI BOS YLFORMYLGLYCINAMI DINE 
CYCLO-LIGASE (EC 6.3.3.1) (AIRS) ( PHOSPHORIBOSYL-AMINOIMIDAZ 
OLE SYNTHETASE) (AIR SYNTHASE). >pir :pir I G64122 | G64122 5 * -ph 
osphoribosyl-5-aminoimidazole synthetase (purM) homolog - Ha 
40 emophiius influenzae (strain Rd KW20) >gp: gp ( U32822 I U32822_2 
Haemophilus influenzae Rd section 137 of 163 of the complet 
e genome. NID: gl574265. 

atgaaaagagcagccaatttttctgtacctggagctggtaaaacggctatgatgtatggc 
acatttgcttttttgtctagtgaaataaagcggaaagttgataaattaatagttatttct 

45 ccaattaatgcatttgaagcttggcgttcagaatttattgaagtttttcaagataaaaga 
gaattacactttatgaacctaagagataaaaaatataatgatttaggtaaagtacgaaca 
gattggggaagtgcaaatgtcattgttttgaattttgaagcaatacaaaagtatgtaggg 
gttttaaatgaacttattaatgataagacaatgatagtttatgatgaggttcataggata 
aaaggtattaatagtagcagagcaagttatgcattaactttaggtcctaaaagttattac 

50 agatacgttttaactggtaccccaattccaaatagttatcaagacatatttaacttctta 
aatattttatataaagatgagtatgatacttattttggttggaatgttgctgatttacaa 
aatcctgatcctaatgaaattaatgacaagttgaacccttttttctggcgtacaaataag 
aacgatttggaagtgcctcaagcagaaaatgatattatattatgtgttaagcctagtaat 
attcaaattgaattagcaaaagcgatatacgaaaatgaatctgggatactagcgatttat 

55 ataaggttactacaagcttcaactaacccagaattattgcaaaagaatattaattatagc 
gaactaggaatgttgaatgatgaattaaatttcgatttggataaagcattaaataaagaa 
gaagaaaatcaagtaaaacaacaaatttataattcttttgatttaaaaaatgtaacttct 
ccaaaattcgaaaaaggtattgaattgattgaaaacttagttagccaaggaaaaaaagtt 
ctagtatggggattgtttgtaggtacaatgaataaaatcaataagaggttactagaaagt 
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gatattaattcaatattgatttatggagaaacacctaaagaagatagggtagatatgatc 
aataattttaggaatggaaatgcacaagttctaatatctaatcctaatacattaggcgag 
tccatatctttacatcagacagtacatgatgcaatatattttgaatataactttaattta 
acgtttatgttgcaatcacgtgatagaatacatcgtttagggttaaataataatcaatat 
5 acaaggtattattatttgatgtctgaaggggatagagcccataaaggttttatcgataaa 
gcagtttataacagactgaaggaaaaagaagatgtaatgttaaatgctattgatggaaat 
actttaaagccaatgattgaagatgattacttagaagatgttaagaaaattattattgaa 
tga 

10 Sequence 2820 

MKRAANFSVPGAGKTAMMYGTFAFLSSEIKRKVDKLIVISPINAFEAWRSEFIEVFQDKR 
ELHFMNLRDKKYNDLGKVRTDWGSANVIVLNFEAIQKYVGVLNELINDKTMIVYDEVHRI 
KGINSSRASYALTLGPKSYYRYVLTGTPIPNSYQDIFNFLNILYKDEYDTYFGWNVADLQ 
NPDPNEINDKLNPFFWRTNKNDLEVPQAENDIILCVKPSNIQIELAKAIYENESGILAIY 

15 IRLLQASTNPELLQKNINYSELGMLNDELNFDLDKALNKEEENQVKQQIYNSFDLKNVTS 
PKFEKGIELIENLVSQGKKVLVWGLFVGTMNKINKRLLESDINSXLIYGETPKEDRVDMI 
NNFRWGNAQVLISNPNTLGESISLHQTVHDAIYFEYNFNLTFMLQSRDRIHRLGLl^'NNQY 
TRYYYLMSEGDRAHKGFIDKAVYNRLKEKEDVMLNAIDGNTLKPMIEDDYLEDVKKIIIE 

20 

Sequence 2821 
Cont ig_0 4 5 l_pos_8 0 6_1 621 
is similar to (with p-value 1.0e-20) 
>gp:gp|U67964 |EVU67964_2 Ectromelia virus H14-B and H14-E g 

25 enes, complete cds . NID: g2145123. 

atgcattatataaaatttattgagtcaaaagataatacaaaactttatatgaaagtgaat 
gatattcaagatgcaaaagcgaatatcattatagctcatggtgtggcagaacatttagat 
cgttatgatgagataacagcatatttaaatgaagcgggttttagtgttattagatatgat 
caaagagggcatggtcgttctgaaggcaagcgtgccttttatagcaatagtaatgaaatt 

30 gtcgaagatttagatgcgataataaattatgtgaagtcaaactttgaaggtaaagtttac 
ttaatcggtcatagtatgggtggttatacagtcactttatatggaacgaaacatccaaat 
acagtgaatggtattataacttctggagcattaacacgttataataataaactatttggc 
aatcctgatagaaacatatcacctgatacttatatagaaaacaatttaagtgagggggta 
tgttctgatttagaggtaatggaaaaatataaacttgatgatttgaatgcgaaacaaatc 

35 tctatggggctcgtcttttcaataatggatggtgttaggtatttgaaagacaatgctcaa 
caatttacagataatattttgatattgcatggcaaggaagatgggctagtaagctatgta 
gattcttitacagctttatcaagaaataggatcagcacataaatcattacacatctotgat 
cgtttggagcatgaaatatttaatgaaagttcttataatagaactatttttaacgaagtt 
attgaatggcttgaaacggaattaacttataactaa 

40 

Sequence 2822 

MHYIKFIESKDNTKLYMKVNDIQDAKANIIIAHGVAEHLDRYDEITAYLNEAGFSVIRYD 
QRGHGRSEGKRAFYSNSNEIVEDLDAIINYVKSNFEGKVYLIGHSMGGYTVTLYGTKHPN 
TVNGIITSGALTRYNNKLFGNPDRNISPDTYIENNLSEGVCSDLEVMEKYKLDDLNAKQI 
45 SMGLVFSIMDGVRYLKDNAQQFTDNILILHGKEDGLVSYVDSLQLYQEIGSAHKSLHIYD 
RLEHEIFNESSYNRTIFNEVIEWLETELTYN* 

Sequence 2823 
Contig_0455_pos_4830_4 159 
50 is similar to (with p-value 4.0e~17) 

>sp:sp|Q10092|YAOD_SCHPO HYPOTHETICAL 24.2 KD PROTEIN C11D3 
.13 IN CHROMOSOME I. >gp:gp | Z681 66 1 SPAC11D3_13 S.pombe chrom 
osome I cosmid cllD3. NID: gll07889. 

gtgtatatattcatgagtaaaaaagttttatttgttttaacaagtacaagtcaatttaca 
55 gacggtacagaaactggattatggttagaagaagctggagcaccatataatatattgact 
gaagaaggtatcaatgttgatgttatttctattaaaggtggaaaagtaaatcttgritcct 
aattctgtttctaatgaatcactgaatcagtatgctaaattcgtgtcacacttaaacgat 
acacctagtatcgaaaatgtaaatgcagatgagtatgacgctatttatctaccaggtgga 
catggtactgtatacgattttgccaataatgagaaattagctgatattttacttcaattt 
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aaaaatagtaataaaataatctcttcagtatgtcatggacctagtgcgtttgtaggtgta 
aaagatgcaaataatcactatctagtagatggtgtcaaaataacttcatttactgatagt 
gaagaaaaagcaatgggatttgaaaataaagtaccatttttaactcaatctaaattagaa 
gagcaaggtgcaaattttgtagtgaaagatgactttacatctcacgtagaaaaagacggt 
5 caatttatcactggacaaaatccccaatcaagtgaagacattggtaaagcacttgcaaat 
gaattaaaataa 

Senuence 2824 

VYIFbiSKKVLFVLTSTSQFTDGTETGLWLEEAGAPYNILTEEGINVDVISIKGGK'vNLDP 
10 NSVSNESLNQYAKFVSHLNDTPSIENVNADEYDAIYLPGGHGTVYDFANNEKLADILLQF 
KNSNKIISSVCHGPSAFVGVKDANNHYLVDGVKITSFTDSEEKAMGFENKVPFLTQSKLE 
EQGANFWKDDFTSHVEKDGQFITGQNPQSSEDIGKALANELK* 

Sequence 2825 

15 Contig__0458_pos_14 99_1038 

is similar to (with p-value 4.0e-51) 

>sp:sp|O31408|ARGR_BACST ARGININE REPRESSOR. >gp : gp 1 Y0954 6 | 
BSARGR_1 B. stearothermophilus argR gene. NID: g2369705. 
gtgttaattgtgccaaaaaagtcagtgagacatataaaaataagagagataatttcaaat 

20 gaacaaatagaaacacaagatgaactagttaaacgtttgaatgagtatgatttaaatgtt 
acacaagctactgtttcacgagatattaaagaattgcaattaattaaagttcctgcacct 
acagggcaatatgtttatagtttaccaaatgatcgtagatatcatccattagagaagttg 
ggtagatatttaatggattcatttgtaaacattgagggtactggtaatctactagttctt 
aaaacgcttcctggtaatgctcaatccattggtgctatacttgatcaaattgattgggat 

25 gaggtacttggtacaatttgtggtgatgatacatgcttacttatttgtcgagacgaagaa 
gcgagtgaagaaatcaaaactcgaattttcaatttattataa 

Se<;j'-.nco 282 6 

VLIVPKKSVRHIKIREIISNEQIETQDELVKRLNEYDLNVTQATVSRDIKELQLIixVPAP 
30 TGQYVYSLPNDRRYHPLEKLGRYLMDSFVNIEGTGNLLVLKTLPGNAQSIGAILDQIDWD 
EVLGTICGDDTCLLICRDEEASEEIKTRIFNLL* 

Sequence 2827 

Contig_04 60_pos_24 81_3032 

35 is similar to (with p-value 2.0e-28) 

>sp:sp|P2664 6l YHDH_ECOLI HYPOTHETICAL 34.7 KD PROTEIN IN MR 
EB-ACCB INTERGENIC REGION (ORFl) (0324). >pir :pir | JS0688 |JSO 
688 hypothetical 35K protein (fabE 5* region) - Escherichia 
coli >gp:gp|M80458 |EC0AC0AC_1 E.coli biotin carboxylase and 

40 biotin carboxyl carrier protein (fabE) and ORFl 35 kDa prote 
in genes, complete cds . NID: gl45172. >gp : gp I U18997 | ECOUW67_ 
183 Escherichia coli K-12 chromosomal region from 67.4 to 76 
.0 minutes. NID: g606010. >gp: gp | AE000404 I AE000404_7 Escheri 
chia coli K-12 MG1655 section 294 of 400 of the complete gen 

45 ome. NID: g2367207. 

atgtctattgaaggtaaagaagtgcttgtacgaggtgccactggaggcgtcggaacgatt 
tcattactcatgttaaataacttagggtatgatgttattgcaagtacgggtagagal.gac 
gccg.>*agf.aaaacttaaaaagcttggtgctaaagaagtaattggccgtttaccag;»agat 
aatagtaaaccattagagaagagaacatggcaggcagccattgatccagttggtggtgaa 

50 aacttaccgtacatcgtcaaacgattggataacaacggaagtgttgcattaattggcatg 
actggtggtaacaattttgaaacaaccgtctttcctttcattttaagaggagcaagtata 
attggtatcgattcagtatttactccaattaaactaagaaaacgtgtttggagaagactt 
gcaaaagacttaaaaccacaacaattacatgacatcaaacatgttatttcattcgatgaa 
atcccaaaagccatcgatcaagtcatcaatcataataatactggacgtattgtcattgat 

55 ttcaatgtttaa 

Sequence 2828 

MSXEGKEVLVRGATGGVGTISLLMLNNLGYDVIASTGRDDAEEKLKKLGAKEVIGRLPED 
NSKPLEKRTWQAAIDPVGGENLPYIVKRLDNNGSVALIGMTGGNNFETTVFPFILRGASI 
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IGIDSVFTPIKLRKRVWRRLAKDLKPQQLHDIKHVISFDEIPKAIDQVINHNNTGRIVID 
FNV* 

Sequence 2829 
5 Cont ig_04 60_pos__2 8 2 6_2 4 40 

is similar to (with p-value 5.0e-19) 

>sp:sptP26646| YHDH_ECOI.I HYPOTHETICAL 34.7 KD PROTEIN IN MR 
EB-ACCB INTERGENIC REGION (ORFl) (0324). >pir : pir j JS0688 I JSO 
688 hypothetical 35K protein (fabE 5* region) ~ Escherichia 

10 coli >gp:gptM80458 I EC0AC0AC_1 E.coli biotin carboxylase and 
biotin carboxyl carrier protein (fabE) and ORFl 35 kDa prote 
in genes, complete cds. NID: gl45172. >gp:gp( U18997 | ECOUW67_ 
183 Escherichia coli K-12 chromosomal region from 67.4 to 76 
♦0 minutes. NID: g606010. >gp:gp I AE000404 | AE000404_7 Escheri 

15 chia coli K-12 MG1655 section 294 of 400 of the complete gen 
ome. NID: g2367207. 

atgaaaggaaagacggttgtttcaaaattgttaccaccagtcatgccaattaatgcaaca 
cttccgttgttatccaatcgtttgacgatgtacggtaagttttcaccaccaactggatca 
atggctgcctgccatgttctcttctctaatggtttactattatcttctggtaaacggcca 
20 attacttctttagcaccaagctttttaagtttttcttcggcgtcatctctacccgtactt 
gcaataacatcataccctaagttatttaacatgagtaatgaaatcgttccgacgcctcca 
gtggcacctcgtacaagcacttctttaccttcaatagacatacctgatttttcaagttgt 
tctatagctaaacctgctgtaaaatag 

25 Sequence 2830 

MKGKTVVSKLLPPVMPINATLPLLSNRLTMYGKFSPPTGSMAACHVLFSNGLLLSSGKRP 
ITSLAPSFLSFSSASSLPVLAITSYPKLFNMSNEIVPTPPVAPRTSTSLPSIDIPDFSSC 
SIAKPAVK* 

Sequence 2831 

Con t i g_0 4 6 3_po s_8 1 4 _2 5 6 8 
is similar to (with p-value 7.0e-69) 

>gp:gp I Z4 7210 I SPDEXCAP_7 S. pneumoniae dexB, cap3A, cap3B an 
d cap3C genes and orf s . NID: gl658316. 

atgataataagttatttacacaatataaataaattgaaaaactctaattttttagacttg 
actcaaaataaaaatggaaacatgatatactttaaaaaagaggtaatactaaaaatgata 
gataattggatagatgtattagatgaaagtttagtcaaagatttttataataatcaaact 
tccgaagagcaacaagaaggacttgatactacactgtcttttggcacggctggtattaga 
gggaaattcggtttaggcgaaggccgattaaataagttcaccgtatctaaagtagcgtta 
ggctttgcccattatttaacatcaagtatcgcgcatcctgtcgtcgtcatacattatgac 
acaagacacttatcacctgattttgctcaaattatcgctaatattctagcaagtcttgat 
attaaagtttatcttgctgatacatacagaacaacacctgatttatcatttgcagtcaga 
tacttacaggcggatgcgggtattatgattacagctagccataatcctaaagattataat 
ggaatcaaagtgtatggagaagatggtgctcaattatcaaccgacgattccgcacgacta 
agcacatatatcgataagttaggtcatccgcttcatattaatttacctagtttaactact 
gaacaacaaaca t taa t teat tcagtcccgagcga agttagagaagat tat ttcaaaaac 
gtacaagacttagttggaactattccacagtctgatttgaaagttgtctttacaagcttg 
catggtacgagtgtgccagttgtacctgacatcttatcttctcttaactttaatcaattt 
gagttagttgcatcacaatgtgaacctgattcagatttcagctctgtagtcagtgcaaat 
ccagaggatcataaagcgtttgatcaatcgatagaacttgctaatcatattgatgctgat 
ttacttattggcacagatcccgatgcagaccgtttaggaatagttgaacgtgatgctgaa 
ggtaacatccactattacaacggaaatcagattggtgcacttttgttaaattatcgtatc 
aaacaaacagaaggattacctaacagaattatgttccaatcaattgtaagtggtggctta 
gctaaatctcttgctcaatatcataatgtcaatttcaaagaagttttgacaggttttaaa 
tatatcgcagctgaaataagacatctgtctcctgaacaaaactttatttttggctacgag 
gaaagttatggcttcttagcccgtcctttcgtgagagataaagatgcgattcaaattgtg 
ccattaatgattaagtatgcagctgaattaaaaaacaaaggacgcatgcttaaagatgaa 
ttagaagacattactcgaaatgtcggtaactttaatgacaagcttttctcacatacattt 
gai.'.ggte.otcaaggcaaggcaaaaatcgaaaatattatgactcaattcagaagtgaaaca 
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ccttcggaaatgtgtggtcttaaagtcattgcaatcgaagattttgaaacaggtaaaaag 
actgacttacaaaatgatgaagtcagcgatataactttacctaaagcgaatgtaataaag 

atatactttaatgaaggatttattgctttgcgtccttctggtacagagcctaaaattaaa 
ctttatgtatcactttcttgtgaccattttgacgtagttgcacaaaaaatgaatgatgct 
5 atatttaactcttaa 

Sequence 2832 

MIISYLHNINKLKNSNFLDLTQNKNGNMIYFKKEVILKMIDNWIDVLDESLVKDFYNNQT 
SEEQQEGLDTTLSFGTAGIRGKFGLGEGRLNKFTVSKVALGFAHYLTSSIAHPVWIHYD 

10 TRHLSPDFAQIIANILASLDIKVYLADTYRTTPDLSFAVRYLQADAGIMITASHNPKDYN 
GIKVYGEOGAQLSTDDSARLSTYIDKLGHPLHINLPSLTTEQQTLIHSVPSEVREDYFKN 
VQDLVGTIPQSDLKVVFTSLHGTSVPVVPDILSSLNFNQFELVASQCEPDSDFSSVVSAN 
PEDHE<AFDQSIELANHIDADLLIGTDPDADRLGIVERDAEGNIHYYNGNQIGALLLNYRI 
KQTEGLPNRIMFQSIVSGGLAKSLAQYHNVNFKEVLTGFKYIAAEIRHLSPEQNFIFGYE 

15 ESYGFLARPFVRDKDAIQIVPLMIKYAAELKNKGRMLKDELEDITRNVGNFNDKLFSHTF 
EGTQGKAKIENIMTQFRSETPSEMCGLKVIAIEDFETGKKTDLQNDEVSDITLPKANVIK 
lYFNEGFIALRPSGTEPKIKLYVSLSCDHFDWAQKMNDAIFNS* 

Sequence 2833 

20 Contig_04 64_pos_2837_3322 

is similar to (with p-vaiue 4.0e-24) 

>sp:sp| P74 561 |HIS4_SYNY3 PHOSPHORIBOSYLFORMIMINO-5-AMINOIMI 
DAZOLE CARBOXAMIDE RIBOTIDE ISOMERASE (EC 5.3,1.16). >gp:gpl 
0909161 D90916_42 Synechocystis sp. PCC6803 complete genome, 

25 26/27, 3270710-3418851. NID: gl653715. 

gtgaccactaagcctatagaagtgggtggcggcattcgttcaaaacaaacaattgaaaat 
tatattcattcaggaatagactattgtattgtaggtacaaaaggtatccaagatatagag 
tggttaacacatatgacacatcaatttccaaataaactctacttatccgtagatgctttt 
ggagagaaaataaagattaatggatggaaagaggatgctaaactcaatttatttgattat 

30 gttgccaaaattgagcatttacctttgggtggtgtgatttataccgatatttcgaaagat 
gggajact ttctggacctaattttgatttgacaggtcgtctcgcactttatacatcgttg 
cctgtaattgcttcaggaggtattagacatcaagaggacttgtttcgattagaatcgtta 
aatgttcatgctgctattgtaggaaaagcagcacatctggatgaattctgggagggatta 
tcttga 

35 

Sequence 2834 

VTTKPIEVGGGIRSKQTIENYIHSGIDYCIVGTKGIQDIEWLTHMTHQFPNKLYLSVDAF 
GEKIKINGWKEDAKLNLFDYVAKIEHLPLGGVIYTDISKDGKLSGPNFDLTGRLALYTSL 
* PVIASGGIRHQEDLFRLESLNVHAAIVGECAAHLDEFWEGLS* 

40 

Sequence 2835 
Contig_04 64_pos_4 1 61_4 709 
is similar to (with p-value 5.0e-40) 
>sp:sp|P44434 |HIS2_HAEIN PHOSPHORIBOSYL-AMP CYCLOHYDROLASE 

45 (EC 3.5.4.19) / PHOSPHORIBOSYL-ATP PYROPHOSPHOHYDROLASE (EC 
3.6.1.31). >pir:pir IA64071IA64071 phosphoribosyl-AMP cyclohy 
drolase (hisIE) homolog - Haemophilus influenzae (strain Rd 
KW20) >gp:gp|U32730|U32730_6 Haemophilus influenzae Rd secti 
on 45 of 163 of the complete genome. NID: g3212191. 

50 atoaatgaagaagcttatcaaaaaactctgaaagaaaagaaagtaaccttcttctctaga 
tctac»ac.uacgtttatggactaaaggtgaaacttctggtcatttccaacacgttg>gagt 
attcatctagattgtgatcaagatgcaatcttaatcaaagtgatgccacaaggtcctaca 
tgtcacactggaagtctgagttgttttaatagtgaaattgaatcgcgctttaaaattcaa 
gcattagcacaaacgattcatcaaagtgctaaaagcaatcaatctaactcttacactcaa 

55 tatttattaaaggaaggtatcgagaaaatatccaagaaatttggtgaagaggcatttgaa 
gttgtgataggtgcgataaaacataatcgtgaagaagttattaatgaaacagcagatgtc 
atgtatcacctttttgtgttactacatagtttggatattccattttcagaagtagaacag 
gtactagcgcatcgccatcaaaaaagaaataattttaaaggcgagcgcaaaaaggttcaa 
gaatggtaa 
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Sequence 2836 

MNtEAifOKTLKEKKVTFFSRSKQRLWTKGETSGHFQHVESIHLDCDQDAILIKVMJQGPT 
CHTGSLSCFNSEIESRFKIQALAQTIHQSAKSNQSNSYTQYLLKEGIEKISKKFGEEAFE 
5 VVIGAIKHNREEVINETADVMYHLFVLLHSLDIPFSEVEQVLAHRHQKRNNFKGERKKVQ 
EW* 

Sequence 2837 

Contig_04 65_pos_10122_9604 
10 is similar to (with p-value 3.0e-49) 

>gp:gp| Y13052 I SSK3MECA1_3 S.sciuri mecAl gene, strain K3 (MM 
2) . NID: g2791901. 

gtgagcatatatattatgaatatatgcgagtcattttatactataatcaatcgcgcacga 
ttgaaaagaggtggagatgtgaaacaagaacaaatgaggttagcgaatcagctttgtttt 

15 tcagcatataatgtaagtcgtttatttgctcaattttatgagaaaaagttaaaacagttt 
ggtataacttattctcagtatttagtattactgacgttatgggaagagaatcctcaaaca 
ttaaattcaattggtagacatttggatttatctagtaatactttaacccccttactaaaa 
agacttgagcaatctggctgggttaaaagagaacgtcaacaatctgataaacgacagttg 
ataattacgttaactgacaatgggcaacaacaacaagaagctgtttttgaagcaatttca 

20 agttgcttaccacaagaatttgatacgactgagtatgatgaaacgaaatatgtgtttgaa 
gaactagagcaaacattaaaacatctcatagaaaaataa 

Sequence 2838 

VSIYIMNICESFYTIINRARLKRGGDVKQEQMRLANQLCFSAYNVSRLFAQFYEKKLKQF 
25 GITYSQYLVLLTLWEENPQTLNSIGRHLDLSSNTLTPLLKRLEQSGWVKRERQQSDKRQL 
IITLTDNGQQQQEAVFEAISSCLPQEFDTTEYDETKYVFEELEQTLKHLIEK* 

Sequence 2839 
Contig_0465_pos_9362_824 4 

30 >sp:spf P47169I YJ9F_YEAST HYPOTHETICAL 161.2 KD PROTEIN IN N 
MD5-HOM6 INTERGENIC REGION. >pir : pi r I 857 1 60 I 857 1 60 sulfite r 
eductase homolog YJR137c - yeast (Saccharomyces cerevisiae) 
>gp: gp I Z4 9637 I SCYJR137C_1 S. cerevisiae chromosome X reading 
frame ORF YJR137c. NID: gl015875. 

35 gtgaattggaagaatttaatgcattacaaagaacaagtcattaatcctatgtctgaaacc 
ctcacatcgatgtttgaacaacagggaattgatgtaatcatggggaaaggtaaacttgta 
gatgctcatacaatagaggtaaataatacaactttacaatcagattatattgttatagca 
actggacaacatagtcatcaattagatattgagggtaaagaatatacgcatgatagtcgg 
gaatttttatcaatgcaatctttaccggatagtatcacttttattggagcaggtattatc 

40 agtattgaattcgcttctatcatgatcaaatcaggtgtagaggttaatgtggttcatcat 
acaaatcatgcacttgaagggtttaacgaatcacacgtcaataaattaattcaaaagtta 
aaagatgeaggtgttaaattttactttagtgagaataccaagtcagttaaaccgafi^igca 
caacgttttatagtagaaactgagtctggaaagatgattgaaacagattatgtactggat 
gcaaccggtagaaagcctaatgttcagcaaataggtttggaaaaagtgggtatactattt 

45 agtgatagaggtattgaggttgacgattatttaagaacaaatgtgaaaaatatatacgca 
agtggggacgttatcaataaaatgattcctaaacttactcctacagctacatttgagtct 
aattatatcgctgcccatatccttggattgaatacagatgccattcagtatccaccaata 
ccttcagtgctttattcattgcctcgtttatctcaaataggtgtcacagttagcgaggct 
aagaaagatgatacgtatatgattaaagatataccattcggaagacaaatggtatttgag 

50 tatcaaaacgaaacagaggctgaaatgtcaattgtattagatagtcacaaacgtttagta 
ggagcagagatttatggtaatgacgctggtgatttggttaatctcctagtctttatcatt 
aatcaaaaacttactgcacaagacttaaataaaaatatttttgcatttcctggagcttct 
agtggcgttatagatttattgaaattggcaatgatgtag 

55 Sequence 2840 

VNWKNLMHYKEQVINPMSETLTSMFEQQGIDVIMGKGKLVDAHTIEVNNTTLQSDYIVIA 
TGQHSHQLDIEGKEYTHDSREFLSMQSLPDSITFIGAGIISIEFASIMIKSGVEVNVVHH 
TNHALEGFNESHVNKLIQKLKDEGVKFYFSENTKSVKPNAQRFIVETESGKMIETDYVLD 
ATGRKPNVQQIGLEKVGILFSDRGIEVDDYLRTNVKNIYASGDVINKMIPKLTPTATFES 
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NYIAAHILGLNTDAIQYPPIPSVLYSLPRLSQIGVTVSEAKKDDTYMIKDIPFGRQMVFE 
YQNETEAEMSIVLDSHKRLVGAEIYGNDAGDLVNLLVFIINQKLTAQDLNKNIFAFPGAS 
SGVIDLLKLAMM* 

5 Sequence 2841 

Contig_04 65_pos_8 159_77 55 

>pir:pir|A34231 IA34231 sulfite reductase (NADPH) (EC 1.8.1. 
2) - Salmonella typhimurium >gp: gp I M23007 I STYCYSJIHA_1 S.typ 
himurium NADPH-sul f ite reductase flavoprotein component (cys 
10 J), NADPH-sulf ite reductase hemoprotein component (cysl) , an 
d 3* phosphoadenosine 5 ' -phosphosulf ate sulfotransf erase (cy 
sH) genes, complete cds . NID: gl53928. 

gtgctcattatcgtaaatactgcaacgaattgtactctgaatgatcaatttaataaatta 
gaaacgctttataaaaaatatcataagtatggtcttgaaattttgagcttcccttqcaat 
15 gattttaataatcaggagccagaattaatcaaagatatttatcgagtatataaatataag 
tttggtattactttccccatccatgctaagattaatgttaatggagagcatgaacaccct 
ttgtacacattattaaaatgtaaacaaccaggattatttggttcgcaaattaaatggaat 
tttactaaatttgtagtagatcaacagggaaatattgttaaacgatttttaccttgtgat 
aatcctaaccaaatggaaaaattaataagacaattattaaaataa 

20 

Sequence 2842 

VLIIVNTATNCTLNDQFNKLEMLYKKYHKYGLEILSFPCNDFNNQEPELIKDIYRVYKYK 
FGITFPIHAKINVNGEHEHPLYTLLKCKQPGLFGSQIECWNFTKFWDQQGNIVKRFLPCD 
NPNQMEKLIRQLLK* 

25 

Sequence 2843 
Contig__04 65_pos_6454_4 568 
is similar to <with p-vaiue 1.0e~30) 
>sp:spi P52035|BSAA_BACSU GLUTATHIONE PEROXIDASE HOMOLOG BSA 

30 A. >gp:gp|L7724 6|BACYACA_17 Bacillus subtilis (YAClO-9 clone 
) DNA region between the serA and kdg loci. NID: gl256615, > 
gp:gp(Z99115|BSUB0012_132 Bacillus subtilis complete genome 
(section 12 of 21): from 2195541 to 2409220. NID: g26344''8. 
atgacaaatatcggaataacaaagtataagaggtggataagattgaacttaagtgttacc 

35 aatagtccttttacggaagggcaagctaaacaaatcaatgaattgcttcacactttatca 
tctaaccaacaagtatggttaagtggctatctgatggcaaatcaacaatcaaatacatct 
acagattctgtagaacaacataactcagatgacaatacagaagcgatgctacatgaaaag 
gaaccttcagttgagccggaagctagatcgataacaattttatatggatctgagtctggt 
aatgcgcaaggacttgcagaaatatttgaacaacgtttatctgatattggaaatgacgtt 

40 acgctcaaatcaatggatgactttaaacctaagaatttgaagaaagtagaagatttattt 
attatcacatctacacatggcgaaggagatccaccagataacgctgttgaattacatgaa 
tacatccacggacgtaaagcgccaaaattggatggggtgagattttcagtattagcatta 
ggagatcaaacctatgaattcttctgtcaaactggtaaagattttgataaccgtttggct 
gaattaggggcagaaagactttatcatcgtactgattgcgatgttgattatgaagaagac 

45 gctgaaaagtggatggccaatgttattaatacgattgattcaacaccggctggtactgaa 
agtgaacaagttgttagcgaatctataaaatcagcaaaagaaaagaaatatagtaagtct 
aatccttacgatgctgaagtattgaccaatattaatttgaatggtagaggttcagacaaa 
gagacgagacatattgagttattacttgataattttggtgaggaatatgaaccgggagat 
tgtgtagttgtcttgcctcaaaatgatccagctatcgtagacttacttattagcacatta 

50 ggttggagtccagaaacacaagttttaattaacgaggatggagatactttaaatcttgaa 
gaggcattaacatcgcattttgaaattactaaattaacaaaaccgttaatagaaaatgct 
gcgatattttttgataatgaagagctttctgaaaaaattcaagataaagaatggattcaa 
aac^acg+-tgagggaagggatttgattgacttattaaatgacttcgcaacgacag<:<. jicta 
caacctgaaaatttacatcaattattaagaaagttaccacctagagagtactcaacatct 

55 agtagttataaagcaacaccagatgaagttcacattactgttggagcagttagatatcaa 
gcacacggtcgggaacgttcaggtgtttgttcagtacaatttgcagagagaatacaagag 
ggcgatacaattcctatctatttaaaacgaaatccgaattttaagtttccgcaagatgaa 
tcaacacctgtgattatgataggtcctgggacaggtgttgcaccgtttagatcctatatg 
caagaacgagaggaactaggttttgaaggaaatacatggttattctttggagatcaacac 
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ttcactacagattttctgtatcaaacggaatggcaagaatggcttgaagatggaacttta 
tcaaaattagatgttgctttttctagagatactgataaaaaagtgtatgtgcaacataaa 

attgcagaaaatagtgaacaatttaatcgatggattgaaaatggcgctactatttatgta 
tgtggtgatgaaagtaaaatggcaaaggatgttcatcaagcgattaaaaatgtgttaatc 
5 aaagagcaaaacctatctgaaacagatgcagaagaatacttaaaacaaatgaaaaqagat 
aaaagatatcaaagagacgtgtattaa 

Sequence 2844 

MTNIGITKYKRWIRLNLSVTNSPFTEGQAKQINELLHTLSSNQQVWLSGYLMANQQSNTS 
10 TDSVEQHNSDDNTEAMLHEKEPSVEPEARSITILYGSESGNAQGLAEIFEQRLSDIGNDV 
TLKSMDDFKPKNLKKVEDLFIITSTHGEGDPPDNAVELHEYIHGRKAPKLDGVRFSVLAL 
GE)QTYEFFCQTGKDFDNRLAELGAERLYHRTDCDVDYEEDAEKWMANVINTIDSTPAGTE 
SEQVVSESIKSAKEKKYSKSNPYDAEVLTNINLNGRGSDKETRHIELLLDNFGEEYEPGD 
CWVLPQNDPAIVDLLISTLGWSPETQVLINEDGDTLNLEEALTSHFEITKLTKPLIENA 
15 AIFFDNEELSEKIQDKEWIQNYVEGRDLIDLLNDFATTELQPENLHQLLRKLPPREYSIS 
SSYKATPDEVHITVGAVRYQAHGRERSGVCSVQFAERIQEGDTIPIYLKRNPNFKFPQDE 
STPVIMIGPGTGVAPFRSYMQEREELGFEGNTWLFFGDQHFTTDFLYQTEWQEWLEDGTL 
SKLDVAFSRDTDKBCVYVQHKIAENSEQFNRWIENGATIYVCGDESKMAKDVHQAIKNVLI 
KEQNLSETDAEEYLKQMKRDKRYQRDVY* 

20 

Sequence 2845 

Contig_04 65_pos_4548_2830 

>gp:gp| Y13052|SSK3MECA1_4 S.sciuri mecAl gene, strain K3 (MM 
2) . NID: g2791901. 

25 atggttaa.tacaaataatcatatttcggaagaattagataaaaatcttgatgaaatggaa 
tttttaaaagcaaatagtgactttttgcgtggaactattgaacaaagtttagct aatcca 
atcactggatccattacacaagatgatgcaaaactgctaaagtttcacggaagt tatatg 
caagatgacagggatttaagagatgagcgtcgtaaacaaaaacttgagcctgcatatagt 
tttatgattcgagttcgtgtacctggggggaaagcgactcctgaacagtggattgctatg 

30 gatgatatctctaatcaatatgcaaatcatacgattaaattaacaacacgccaagcattt 
caatttcatggaattcttaaacgtaatttgaaacaatcaatgaaaaatattaatcatgca 
gtacttgattctattgctgcatgtggagatgttaatcgtaatacgatgtgcaatcctaat 
ccttatcaatctcaagtacataaggagattaatgattatgcaacgcgtataagtaatcac 
ttacttccaagaacaaatgcatatcatgaaatttggcttgatggtgaaaaggttttagat 

35 tcgagtgaggaaaaggaacctatttatgggaatacgtatttaccacgtaaattcaaaata 
ggtattgcagtaccaccatctaatgatattgacgtctattctcaagatattggtttaatc 
gctatcgttgaacaagatgagttaattggatttaatgtgactatcggtggcggtatgggt 
atgactcatggtaatactgaaacatatcctcaacttggacgtctcataggttttatacct 
aaggaaaaggttgtagatgtatgtgagaaaatacttacaatacaacgtgattatggtaat 

40 cgtgaaaatcgaaaaaatgcacgttttaaatatacagtggaccgtctaggagaaacttgg 
gtgactgaagaattaaaccgacgattaggttgggaaattaaagcgccacgtgatttcgaa 
tttgaacataatggtgatcgattaggttggattgaaggtattaataattggaatttcact 
ttatttatacaaaatgggcgtgtgaaagatactgaagactatttgttaaaaacaacctta 
agagaaatcgcagaaatccatactggagatttcagattatcacctaatcagaacttagtt 

45 attgcaaatgtttctcctgagaaaaaggaagaaatacaagctattattgataaatataaa 
ttaacagatggcaaaaattatacaggacttagaagaaattctatggcttgtgttgctttc 
ccaacgtgtggtttagctatggcagaatctgaaagatatcttccttcactaattacaaaa 
attgaagatttattagatgagtctggtttaaaagaggaagaaataacgattcgtatgaca 
ggttgtcccaatggatgtgcgagaccagcgctagcagaaatagcctttatcggtaaagca 

50 cctggtaaatataatatgtacttaggtggtagttttaaaggcgaacgtctaaataaaata 
tataaagagaatatcgacgaaaatgagatattagaaagtctacgtccattgttgttgcgt 
tatagtaaagagcgtcttgacggagaacactttggggactttgtaattcgtgacggtgtg 
atagccaaagttcatgatggtcgcgattttcatagttaa 

55 Sequence 2846 

MVNTNNHISEELDECNLDEMEFLFCANSDFLRGTIEQSLANPITGSITQDDAKLLKFHGSYM 
QDDRDLRDERRKQKLEPAYSFMIRVRVPGGKATPEQWIAMDDISNQYANHTIKLTTRQAF 
QFHGILKRNLKQSMKNINHAVLDSIAACGDVNRNTMCNPNPYQSQVHKEINDYATRISNH 
LLPRTNAYHEIWLDGEKVLDSSEEKEPIYGNTYLPRKFKIGIAVPPSNDIDVYSQDIGLI 
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AIVEQDELIGFNVTIGGGMGMTHGNTETYPQLGRLIGFIPKEKWDVCEKILTIQRDYGN 
RENRKNARFKYTVDRLGETWVTEELNRRLGWEIKAPRDFEFEHNGDRLGWIEGINNWNFT 
LFIQNGRVKDTEDYLLKTTLREIAEIHTGDFRLSPNQNLVIANVSPEKKEEIQAIIDKYK 
LTDGKNYTGLRRNSMACVAFPTCGLAMAESERYLPSLITKIEDLLDESGLKEEEITIRMT 
5 GCPNGCARPALAEIAFIGKAPGKYNMYLGGSFKGERLNKIYKENIDENEILESLRPLLLR 
YSKERLDGEHFGDFVIRDGVIAECVHDGRDFHS* 

Sequence 284 7 

Contig_04 67_pos_7 413_694 3 

10 is similar to {with p-vaiue 2.0e-17) 

>sp:sp|P4 5637 1 YPRA_CORGL HYPOTHETICAL 33.0 KD PROTEIN IN PR 
OB-PROA INTERGENIC REGION. >gp : gp | U31230 I CGU31230_3 Coryneba 
cterium glutamicum Obg protein homoiog gene, partial cds, ga 
mma glutamyl kinase (proB) gene, complete cds, and (unkdh) g 

15 ene, complete cds. NID: g950194. 

atgaaagtaataggggttagtaagtcaggaaaaaatgttgaacaatttgatgaagt titat 
acaar ty^jagagttagatgatgttattgaaaaggcaaatattattgttaatgcat :acct 
gaaacagaagaaacaatttacttattaaaaagaaaagatttcatacaaatggacaataat 
gccttatttataaatgtaggaagaggaacaattgttgatgaggaagtgctcatcaatgta 

20 ctcaaagatcgattaatcagacatgcttatttagatgtttttgaaaaagaaccacttagt 
aaggacaatcctttatatgatttagataatgtgaccataactgctcatattacaggtaat 
gattctaataataatagagaagctacggacattttcaaaaagaatcttgagcattttctc 
aataattatgatgtaattgagaataaagtagacttagattatggttactaa 

25 Sequence 284 8 

MKVIGVSKSGKNVEQFDEVYTIEELDDVIEKANIIVNALPETEETIYLLKRKDFIQMDNN 
ALFINVGRGTIVDEEVLINVLKDRLIRHAYLDVFEKEPLSKDNPLYDLDNVTITAHITGN 
DSNNNREATDIFKKNLEHFLNNYDVIENKVDLDYGY* 

30 Sequence 284 9 

Contig_04 68_pos_13714_9209 

>sp:sp! P39812 |GLTB_BACSU GLUTAMATE SYNTHASE [NADPH] LARGE C 
MAIN (EC 1.4.1.13) (NADPH-GOGAT) . >gp : gp | Z 9911 3 j BSUBOO 1 0_138 
Bacillus subtilis complete genome (section 10 of 21) : from 
35 1781201 to 2014980. NID: g2634090. >gp:gp | Z99114 | BSUB0011_9 
Bacillus subtilis complete genome (section 11 of 21) : from 2 
000171 to 2207900. NID: g2634230. 

gtgtgcatcatgtacaatgagaaattaaaaaagggactatacgattatcgtgaagagcat 
gatgcgtgtggtattggattttatgccaatatggataataaaagatctcacgatattata 

40 gaaaaatctttagaaatgttaagacggttagatcatcgtggtggagtaggtgccgatggt 
attactggtgatggtgcaggaattatgacggagataccataccaattgttcgaacaatta 
acagaattcaaagttcccggcgaaggatattatgccgtgggattatttttttctaaagag 
aaagttagagattcaattcacgaagagatgtttaatcaatattttgaaagtgaaggtttt 
aaagtcattggatatagagatgtgccagtagatactcgcgctattgctcaacatgttgca 

45 gatactatgccttatattcaacaagtatttgttgacatcacaggtgtaaaagaagttgaa 
aaacgattgtttttagcaagaaagcaaattgaaaaatatagtgaaacacaatccatagat 
ttatattttacaagtctctctcatagaacgattgtttataaaggttggttacgttcggat 
caaattaaaggcttatatttagacctacaaaatgaggcatatcaatcaaaattaggactt 
gtacactcccgctttagtactaatacatttccaagttggaaacgtgcacatcccaatcgc 

50 atgcttatgcacaatggtgaaattaataccattaagggtaacgtaaactggatgcgagca 
cgccaaaataaactagttgaaacattatttgaagatgagaaagataaggtgcattttatt 
gttgatgaagatggtagtgactcatcaatagttgataatgcgttagagttcttatcatta 
gcaatggagcctgaaaaagcagcgatgttattaattccagagccatggttatacaatgaa 
tctaacgataaaaaagttcgctctttctatgaattttatagttatttaatggagccatgg 

55 gatggaccaactatgatttccttttgcaatggagataagataggtgcattgactgataga 
aat.3a.^tlaagacctgggcgttatacaataactaaagacaattttattgtttttt*:ttcc 
gaagtaggtgtcattgatgttccagaagaaaatgtagcatttaaaggacaacttacjtcct 
ggaaagttattacttgtagactttttgcaaaataaggttgtagaaaataatgagctaaaa 
actaatattgctaatgagttgccctacgaacaatggctaaaagattataaaaataaaaat 
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gatttagataatatttattaccaatcttccgactgggatgatcaaacactcttccgctta 
cagaaacaatttgcttacactaaagaagatatcaataaatatatgacagatttagtcatc 
aataaaaaagatcccatcggagcgatgggatatgatgcacctattgcagttcttaatgat 
aagcctgagtcactatttaattattttaaacaattatttgcacaagttaccaatccaccc 
5 atcgatgcttatcgagaaaagattgtaactagtgaactttcatatttaggttcagaaggg 
aacctcttatgtcctgatgaatcagttttagaaagaattcaattaaaaaaaccagtttta 
aatgaagcgcaattatcatcaatcgatcattcgtattttaatgtaacgtatttatctaca 
ctttatacaggtgatttggaaagtagcttaaatgaactagggaaccgagcaatacaggct 
gtacatgaaggtgcgaaaattttggtgttagacgatacgtctttaactcacgaaaatagt 

10 tatgcaatgccaattttattagcgttaagtcacgtgcatcaattattaattcgagaagga 
ttaagaatggagaccagcctcattgcgcagtccggtgaaacacgagaagttcatcacgtt 
gcatgtttacttggttatggtgcaaacgctgtagttccatatttagcgcaacgaacgatt 
gaacaattaacgcgtcaaggtcaactttcaggaactgtcgctgaaaatgttgctacgtat 
accaatgtattgtcagaaggcgttattaaagtgatggctaaaatgggcatttctactgta 

15 caaagttatcaaggagcacagatatttgaagcggtaggtttatcgaatagcgtcattgaa 
aaatattttacaggtacacagtcaaaattatctggtataagtattgaacaaatagacaaa 
gagaataaagcgagacaaagtgatgattctgattatcttgaatccggaagtgtattccaa 
tggagacagcaaggtcagcatcatgcatttaatcctcgtacgatttttttattgcagcat 
gcatgtagagaaaatgattacgagttatttaaaaaattctccaaaactgtaaatttaaaa 

20 cgtacggatcatattagacatttattagaattcaagacacgccaatctattgatattagt 
cgtgttgaaccagcaagtgaaatcgtaaaacgttttaatacaggagcaatgagttacggc 
tctatctcagcagaggcacatgagacgttggctcaagctatgaatcaaattggaggtaaa 
agtaatagtggagaaggtggagaagattcttcacgttacgaaattcaaaaggatggaagt 
aataagataagtgcgattaagcaagttgcatcaggtcgttttggggtgacgagtgattac 

25 ttgcaacatgcaaaagaaattcaaattaaagtcgcacaaggcgctaaaccaggggaaggt 
ggacaactaccaggttcaaaagtatatccatggattgctgagactagaggttcgacacca 
ggtataggattaatttcaccaccaccacaccatgatatttattcaattgaggacttagca 
caijc-: i.tcatgatttaaaaaatgcaaatagaagagctgatattgcagttaagcV cgta 
tcaaaaactggcgttggaactatagcttcaggggtagctaaagctttcgccgataaaatt 

30 gttataagtggttatgatggaggtacaggtgcatcgcctaaaacaagtattcaacatgca 
ggtttgccatgggagataggccttgccgaaacacatcaaacacttaaattaaatgatttg 
cgtagtcgcgtaaaattagaaacggatggtaagttactgacgggtaaagatgtagcttat 
gcttgtgcgcttggtgcagaagaatttggtttcgcaacagcaccacttgttgttttgggg 
tgtattatgatgagggtttgtcataacgatacgtgtccagtaggggttgcaacacaaaac 

35 aaagatttaagagctttgtttagaggtaaggcacagcatgtag.ttaactttatgtatttt 
atagctgaagaattacgtgaaattttggcttcacttggtttagaaacagtagaagagtta 
gtaggaagaacagatcttcttcaacgttcgacgcaattgaaaccaaatagtaaagcagct 
tcgcttcaaatagaacgtttaatagaacaatttgacggggttaatacgaaagagatatca 
caaaaccatcatcttgatgaaggattcgatttgaattatctgtacccagacgcacgctat 

40 agtattgaaaacgggcactcttttaccggaaattatgttgttaataatgaacagcgagat 
gtaggtgtaattacaggtagtgcgatagctaaacaatatggagaagaaggattacctgaa 
gatacgatacttgcttacactgaaggtcatgcaggtcaaagcttagctgcatatgcacca 
cgcggattaacaatccatcataccggtgatgctaatgactacgtaggtaaaggattgtcc 
ggtggaactgtcatcgtaaatgctccaaatagtcaacgtgaaaatgaaattatagcagga 

45 aatataaacttttacggggcttctagaggtaaagcgtttatcaatggtaaagctggtgag 
cgtttctgtatcagaaatagtggtgcagatgttgtagtagaaggtattggtgatcatgga 
cttgaatatatgacagggggacatgtcattatcttaggagatgttggaaagaactttggc 
caagcfcaLgagcgggggcgtaagttatattttctcttctgacgtggagaaatttaoaaag 
gttaatgcgcttgaaactttagaattcagtagcatacgttttgatgaggaaaaatctctt 

50 atcaaagacatgcttgaagcacattttaagcatacacgtagtaacaaagcacgccaatta 
cttgaccaatttgacaatattgaaaagttagcaattaaagttattccgaaagattacaaa 
ttaatgatgcaaaaaattgatttgaaaaaacgtcaaatggaacgtgaagatgaagcaaca 
ctggcagcgttttatgatgacagagaaacaattgaacaagagctacagccagcagtcatt 
tattaa 

55 

Sequence 2850 

VCIMYNEKLKKGLYDYREEHDACGIGFYANMDNKRSHDIIEKSLEMLRRLDHRGGVGADG 
I TG DGAG I MTE I P YQL FEQLTE FKVPGEG Y YAVGLFFSKEKVRDS I HEEMFNQY FESEG F 
KVIGYRDVPVDTRAIAQHVADTMPYIQQVFVDITGVKEVEKRLFLARKQIEKYSETQSID 
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LYFTSLSHRTIVYKGWLRSDQIKGLYLDLQNEAYQSKLGLVHSRFSTNTFPSWKRAHPNR 
MLMHNGEINTIKGNVNWMRARQNKLVETLFEDEKDKVHFIVDEDGSDSSIVDNALEFLSL 
AMEPEKAAMLLIPEPWLYNESNDKKVRSFYEFYSYLMEPWDGPTMISFCNGDKIGALTDR 
NGLRPGRYTITKDNFIVFSSEVGVIDVPEENVAFKGQLNPGKLLLVDFLQNKVVENNELK 
5 TNIANELPYEQWLKDYKNKNDLDNIYYQSSDWDDQTLFRLQKQFAYTKEDINKYMTDLVI 
NKKDPIGAMGYDAPIAVLNDKPESLFNYFKQLFAQVTNPPIDAYREKIVTSELSYLGSEG 
NLLCPDESVLERIQLKKPVLNEAQLSSIDHSYFNVTYLSTLYTGDLESSLNELGNRAIQA 
VHEGAKILVLDDTSLTHENSYAMPILLALSHVHQLLIREGLRMETSLIAQSGETREVHHV 
ACLLGYGANAVVPYLAQRTIEQLTRQGQLSGTVAENVATYTNVLSEGVIKVMAKMGISTV 

10 QSYQGAQIFEAVGLSNSVIEKYFTGTQSKLSGISIEQIDKENKARQSDDSDYLESGSVFQ 
WRQQGQHHAFNPRTIFLLQHACRENDYELFKKFSKTVNLKRTDHIRHLLEFKTRQSIDIS 
RVEPASEIVKRFNTGAMSYGSISAEAHETLAQAMNQIGGKSNSGEGGEDSSRYEIQKDGS 
NKISAIKQVASGRFGVTSDYLQHAKEIQIKVAQGAKPGEGGQLPGSKVYPWIAETRGSTP 
GIGLISPPPHHDIYSIEDLAQLIHDLKNANRRADIAVKLVSKTGVGTIASGVAKAFADKI 

15 VISGYDGGTGASPKTSIQHAGLPWEIGLAETHQTLKLNDLRSRVKLETDGKLLTGKDVAY 
ACALGAEEFGFATAPLVVLGCIMMRVCHNDTCPVGVATQNKDLRALFRGKAQHVVNFMYF 
lAEELREILASLGLETVEELVGRTDLLQRSTQLKPNSKAASLQIERLIEQFDGVNTKEIS 
QNHHLDEGFDLNYLYPDARYSIENGHSFTGNYVVNNEQRDVGVITGSAIAKQYGEEGLPE 
DTTLAYTEGHAGQSLAAYAPRGLTIHHTGDANDYVGKGLSGGTVIVNAPNSQRENEIIAG 

20 NINF':Gr'3RGKAFINGKAGERFCIRNSGADVVVEGIGDHGLEYMTGGHVIILGDV.TKNFG 
QGMSGGVSYIFSSDVEKFKKVNALETLEFSSIRFDEEKSLIKDMLEAHFKHTRSNKARQL 
LDQFDNIEKLAIKVIPKDYKLMMQKIDLKKRQMEREDEATiAAFYDDRETIEQELQPAVI 
Y* 

25 Sequence 2851 

Con t i g_0 4 6 8_pos_ 917 0_7 728 

is similar to (with p-value 5.0e-25) 

>sp:sp| P33019I YEIHECOLI HYPOTHETICAL 36.9 KD PROTEIN IN LY 
SP-NFO INTERGENIC REGION. >gp : gp | U00007 | ECOHU47_4 9 4 7 to 4 8 

30 centisome region of E.coli K12 BHB2600. NID: g453983. >gp:gp 
IAE000305I AE000305_3 Escherichia coli K-12 MG1655 section 19 
5 of 400 of the complete genome. NID: gl788479. 
atgaaatatgataaacagtcgctatcagaattgtctttggtagaccgtctttcgaatcat 
gaagcgtttcaacaacgcttcactaaagaagatgcttcgattcagggtgcgcgctgtatg 

35 gattgtggaacacctttttgtcaaactgggcaatcttatggaagagaaacaataggatgc 
cctattggtaattatatacctgagtggaacgacttagtctatcatcaagattttaaagct 
gcttacgaaagattgagagagacgaataattttcctgaatttacaggaagagtttgtcct 
gcaccatgtgagcaatcatgtgttatgaaaattaatagagaatccgtggcgattaaaggt 
at-*, giaccftacaattattgatgaagcatatgagaatgagtgggttcatcccgcat.iccct 

40 gaagatcataaagaccaacgagttgctatcgtaggtagtggtccagcgggacttacagca 
gctgaagaattaaactttaaaggctataaagttactgtttatgaaaaggcgcatgaacca 
ggcggcttgctaatgtatggtataccaaatatgaaactagataaagacgtaatacgtcga 
cgtgtatcacttatgaaagatgctggggttttatttaaaacaggcgttgaaattggcgtc 
gatgtgagccgtgaaacacttgaagaaaattatgatgctattattttatgcacaggtgct 

45 caaaatgcgagagatttaccattggaaggacgaatgggctctggtattcattttgcaatg 
gactatct tact gaacaaacacagt at ctaaatggt gaga ttgaaagtttgagca t tact 
gctaaagataagaatgtaattattataggtgctggtgatactggtgcagactgtgtagcg 
acagcattacgtgaaaactgtaaatctattgttcaatttaataaatatacgaaacagcct 
gaagagattacttttgaaagtaatacttcctggccattagcaatgcctgttttcaaaatg 

50 gattatgcgcataaggaatatgaagctaaatttggtcaagaaccaagagcctatggtgta 
caaacaatgcgctatgatgttgacgagttaggaaatgttaaaggcttatatacacaaata 
ttaaaagaaacgcctgatggcatggtgatggaagatggaccagaacgattttggccggct 
gatttagtcttattatctatagggtttgttggtactgaaaccactgttccgcatgcgttt 
gatatacacaccgagcgtaataaaattgtagctaatgatacaaattatcaaactaatcac 

55 gctaaaatatttgctgcaggagatgcaagacgaggtcagagtttggttgtttgggcaata 
aaagaaggtcgtgaagtagcacattctgttgatcaatacttaagtaaagaagttctagtg 
taa 

Sequence 2852 
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MKYDKQSLSELSLVDRLSNHEAFQQRFTKEDASIQGARCMDCGTPFCQTGQSYGRETIGC 
PIGNYIPEWNDLVYHQDFKAAYERLRETNNFPEFTGRVCPAPCEQSCVMKINRESVAIKG 
lERTIIDEAYENEWVHPAYPEDHKDQRVAIVGSGPAGLTAAEELNFKGYKVTVYEKAHEP 
GGLLMYGIPNMKLDKDVIRRRVSLMKDAGVLFKTGVEIGVDVSRETLEENYDAIILCTGA 
5 QNARDLPLEGRMGSGIHFAMDYLTEQTQYLNGEIESLSITAKDKNVIIIGAGDTGADCVA 
TALRENCKSIVQFNKYTKQPEEITFESNTSWPLAMPVFKMDYAHKEYEAKFGQEPRAYGV 
QTMRYDVDELGNVKGLYTQILKETPDGMVMEDGPERFWPADLVLLSIGFVGTETTVPHAF 
DIHTERNKIVANDTNYQTNHAKIFAAGDARRGQSLVVWAIKEGREVAHSVDQYLSKEVLV 
*■ 

10 

Sequence 2853 
Contig_04 68_pos_6592_5564 
>gp:gp (085230 I PEEGLTD__4 Plectonema boryanum URF141, ORF243, 
NADH-dependent glutaraate synthase large subunit (gltB) and 
15 small subunit (gltD) and URF289 genes, partial and complete 
cds. NID: gl339947. 

gtgctaagatttcatatcgaggtgaaagtacatatgaaatcaataacgcaggcttcattt 
atgaaaggtattatgtttacatttacgattgcaataatcagttatatattagctaaattt 
cctattttacatacgattggggcgttagctattgccatcatttttgcgatgatataccgc 

20 caagtcataggttatcctgagcatattcgcccaggtattacgtttgcatcgaaacgttta 
ttaaaatttgcgattatcttatatgggttaaaattaaatatgggagatattctaggtaaa 
ggttggaaattactacttattgatattatcgtaattatcttttcaataagtttaacttta 
cttttgaatcaaattattaaaggaaataaagatatctctatactacttggtattggtaca 
ggagtatgtggagctgcagctattgcagctacagcaccaatcttaaaatctaaagaaaaa 

25 gacattgcaataagtgtaggtattattgcactagttggaactatatttgcacttatttat 
acagctatcgaggctatt tttaacatacctactataacttatggtgcttggacaggtatc 
agtctacatgaaatcgctcaagtcgttttagcagcaggtattggcgggtcggaggcaatg 
acatttgctttacttggaaaattaggccgtgtgtttttacttattccattaagtattgtc 
ttgattttgtatatgcgttataagtcacactcaagtcaagtacaacaaaaaatcgatatt 

30 ccttactttctaattggatttattataatggcttgtatcaatacatttgttcctat ^:cct 
tcatf.acttatgaatattataaatgttattacaacgttatgtatgttaatggcgatggtt 
gctctaggattgaatatcgttttaaaagaagttatttcaaaagcacttaaaccattcatt 
gtgatctgtataacttcaatttgtctgtctggtgtgactctcttagttacgtctataatg 
tttaaataa 

35 

Sequence 2854 

VLRFHIEVKVHMKSITQASFMKGIMFTFTIAIISYILAKFPILHTIGALAIAIIFAMIYR 

QVIGYPEHIRPGITFASKRLLKFAIILYGLKLNMGDILGKGWKLLLIDIIVriFSISLTL 
LLNQIIKGNKDISILLGIGTGVCGAAAIAATAPILKSKEKDIAISVGIIALVGTIF7VLIY 
40 TAIEAIFNIPTITYGAWTGISLHEIAQVVLAAGIGGSEAMTFALLGKLGRVFLLIPLSIV 
LILYMRYKSHSSQVQQKI DI PYFLIG FX IMACI NTFVPI PSLLMNI I NVITTLCMLMAMV 
ALGLNI VLKE V I S KALK P FI V I CI TS I CLSGVTLL VTS IMFK* 

Sequence 2855 

45 Con t ig_0 4 6 9_po s_5 8 2 5_5 2 3 2 

is similar to (with p-value 2.0e-40) 

>gp:gp| Z67739|SPPARCETP_2 S. pneumoniae parC, parE and trans 
posase genes and unknown orf. NID: gl490398. 
atgttaatcttgagttatctgattggtgcattcccaagcgggttaattattggtaaatta 

50 ttttttaaaaaagatataagacaatacggtagtggaaatactggagcaactaacacjtttt 
cgtgttcrtggaagaccagctggatttatagttacgtttttagatattttcaagg^attt 
attacagtcttttttccactatggttcccagttcatgcggatggtgttataagcaccttc 
tttacaaatggtttaatagtaggattgtttgcaatactcggtcacgtgtatccaatatat 
ctgaaatttaatggcggaaaagcagtagctaccagtgcaggagttgtattaggtgtcaat 

55 cctattttacttcttatcttggcaattatcttttttagtgtattaaaaatctttaaatat 
gtttctttatcaagtatcattgcagcaattagttgtgtgattggttcaatcatcattcat 
gattatattttacttgctgttagcggaattgtttcaatcatattaataattcgacacaaa 
tctaatatagttagaatttttaaaggagaagaacctaaaattaaatggatgtaa 
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Sequence 2856 

mlilsyligafpsgliigklffkkdirqygsgntgatnsfrvlgrpagfivtfld:.7kgf 
itvffpi.wfpvhadgvistfftnglivglfailghvypiylkfnggkavatsagvvlgvn 
pilllilaiiffsvlkifkyvslssiiaaiscvigsiiihdyillavsgivsiiliirhk 
5 snivrifkgeepkikwm* 

Sequence 2857 
Con t ig_0 4 7 0_pos_7 8 4 3_9 1 8 0 
is similar to (with p-vaiue 3.0e-97) 
10 >gp;gp|Y13052|SSK3MECAl_l S.sciuri mecAl gene, strain K3{MM 
2) , NID: g2791901. 

atgagtaaggttaatcatttgatagttgaagatgaacgttattttgcacattctggacgg 
attaaatattatccactagttattgatcatggttatggagccacgttaatcgatgtagac 
ggtaagtcttatattgatttattagcaagtgcaagttcgcaaaatgtgggtcacgctccg 

15 aagccagtagtcgaagcaattaagaaccaaactgagaaat teat teat tat acaccagca 
tatatgtatcatgaaccgttagtgcgattatcaaaaaaattatgtgacattgctcctgga 
aattatgagaagagagttactttcggattaagtggctctgatgctaatgatgggattata 
aagtttgcacgagcgtatacaggacgtccatacatcataagttttactaatgcataccat 
ggttcaacatttggttcattatcgatgtcttcaattagtttgaatatgcgtaagcattac 

20 gggccattacttaatggattttaccatataccttttcccgataagtatagggggatgttt 
gagcaagctaagcctaacacggttgaggaatatttagctcctttaaaagaaatgtttgca 
aaatacqtccctgcggaggaagttgcatgtattgtagttgaaacaattcaaggtgacggt 
ggcttacttgaacccgtaccaggttattttgaagcattacaagagctttgccacgctcac 
aatattcttattgcagttgatgatatccaacaaggattaggtcgtacaggaaagtggagt 

25 tccgtagatcattattattttactccagatttaatgacatttggaaagtcattagctgga 
ggtttaccaatgtctgcgattgtaggtcgtaaagaaatcatggaaagtcttgaagcacct 
gctcatttatttacaactggtgcaaat cctgtaagttgcgaagcagccttagcaacgata 
aagatgattgaagatgaagatttactaaacgcttcatggaaaaaggggagttacgttaga 
aaaagaatagacccatggatagaacgttatcaatatgtaggtgatgttcgaggtattgga 

30 ttatcgattggaatagacatagtatcaaataaaattgagaaaactagagattctgaagca 
gcattaaagatatgtaattactgctttgaaaatggtgtgattatcatagcagttgcgggt 
aatgttttaagatttcaaccaccacttgtaattacctataagcaacttgataaagcatta 
gatacaatagaacaggcgcttgaaaagttggaaagaggagaattaaatcaatatgacatt 
agtggtcaaggttggtaa 

35 

Sequence 2858 

MSKVNHLIVEDERYFAHSGRIKYYPLVIDHGYGATLIDVDGKSYIDLLASASSQNVGHAP 
KPWEAIKNQTEKFIHYTPAYMYHEPLVRLSKKLCDIAPGNYEKRVTFGLSGSDANDGII 
KFARAYTGRPYIISFTNAYHGSTFGSLSMSSISLNMRKHYGPLLNGFYHIPFPDKYRGMF 
40 EQAKPNTVEEYLAPLKEMFAKYVPAEEVACIVVETIQGDGGLLEPVPGYFEALQELCHAH 
NILIAVDDIQQGLGRTGKWSSVDHYYFTPDLMTFGKSLAGGLPMSAIVGRKEIMESLEAP 
AHLFrTGANPVSCEAALATI KMIEDEDLLNASWKKGSYVRKRI DPWIERYQYVGDVRGIG 
LSIGIDIVSNKIEKTRDSEAALKICNYCFENGVIIIAVAGNVLRFQPPLVITYKQLDKAL 
DTIEQALEKLERGELNQYDISGQGW* 

45 

Sequence 2859 

Con t ig_0 4 7 l__po s_8 2 3 5_67 0 3 

>gp:gp| Z4 6863|ACRBDOXN_10 Acinetobacter sp. cysD, cobQ, sod 
M, lysS, rubA, rubB, estB, oxyR, ppk, mtgA, 0RF2 and 0RF3 ge 

50 nes. NID: g2462044. 

atggaagaaagaattggtttgatagacattggttccaacacgattcgacttgttatattt 
ggctacaataaaaaaactgggctcaatgaaatactgaatataaaaacacctgcacgttta 
agtcaatatctcactaagtccaatgaaatgaatgatgaaggtattcatgttttaaaagag 
actttaagcagttttagaaaagttgcggataaatttaacgttgatgcattatatcccatc 

55 gcaacagctgctatccgtcaatctaaaaatcgtgaagctatcattaaagaaattaaacaa 
gatattcatatcgaaattcaaattgtacctgaagaagatgaagcattttacggttactat 
gcgattacacatactactgatattgaaaatggaatttctgtcgatatcggtggcggttct 
accgaagttacccttttcaaagacaaacaact taaagaggctcatagctttccatttggc 
gtggtatcacttaagcgtcagttttttggtgataaagaacataatgacaaaacagccatt 
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aaaaatatggaacagtttttacgtgagcaatttagtcagttagactggctatccaaccaa 
catattgcgctcgtaggagtaggtggttctgcacgtaatgttgcacgcattcatcaatca 
gcacatgcataccctatcggtggcgttcataactatacgatgacttcgaaagatattaac 
aatgtttatgatctaatccgtaaaagttctcgcgatgaactaacaaatttagacggttta 
5 agtcgcgatcgcgtcgatattattctgccggctatctccgtctttaaaacacttttcaaa 
aaaattgacgccacacaattcaccttctcaagaaaaggaattcgtgaaggatttattatg 
aaccacatcagcaaacgatatcctgatgagtttaataaaagtaacgtacgtaaagatgca 
ttacgtcatttagcgaacgaatatcatattgaagaaacgagtgctaatcgtcgtgtgaaa 
ttagctcaatccttattgaatcaaattataagtgaacgatcacttaatatttcagaaatg 

10 gaaaaagaattatttattgaaggtgcctacatttattatctaggtagtttcattgattca 
gactcaagttcaccacatacgtattacttaatcgcaaattcaatgattaacggcttttca 
cataaagatcgtgtgaaattagctttgttagctagttttaaaaacaaatctttacttaaa 
ttttattgcaaagaaacacagtggtttagtaataaagaaatagatacaatacaagcttta 
ggag.rga^.tattaaatttgcaaacaccttgaatatctcacatactagttttgtagaggaa 

15 gttaaactaaaagcaaagaaagatgacaaatacgatttattagtttattacaaaggttca 
cctattgcagaagaataccaagcaaatcgtcagaaaaagcatattgagaaaattttaaaa 
ggtaaggtttctattatatttacaaaatcttaa . 

Sequence 2860 

20 MEERIGLIDIGSNTIRLVIFGYNKKTGLNEILNIKTPARLSQYLTKSNEMNDEGIHVLKE 
TLSSFRKVADKFNVDALYPIATAAIRQSKNREAIIKEIKQDIHIEIQIVPEEDEAFYGYY 
AITHTTDIENGISVDIGGGSTEVTLFKDKQLKEAHSFPFGVVSLKRQFFGDKEHNDKTAI 
KNMEQFLREQFSQLDWLSNQHIALVGVGGSARNVARIHQSAHAYPIGGVHNYTMTSKDIN 
NVYDLIRKSSRDELTNLDGLSRDRVDIILPAISVFKTLFKKIDATQFTFSRKGIREGFIM 

25 NHISKRYPDEFNKSNVRKDALRHLANEYHIEETSANRRVKLAQSLLNQIISERSLNISEM 
EKELFIEGAYIYYLGSFIDSDSSSPHTYYLIANSMINGFSHKDRVKLALLASFKNKSLLK 
FYCKETQWFSNKEIDTIQALGGIIKFANTLNISHTSFVEEVKLKAKKDDKYDLLVYYKGS 
PIAEEYQANRQKKHIEKILKGKVSIIFTKS* 

30 Sequence 2861 

Cont ig_0 4 7 l_pos_6 6 5 9_4 47 6 

is similar to (with p-value l.Oe-24) 

>t,-p:gpl AF083928 I AF083928_2 Vibrio cholerae polyphosphato ki 
nase (ppk) and exopolyphosphatase (ppx) genes, complete cds. 

35 NID: g3452464. 

atgaggtgtatgtatagtatgcaaactcgattgggagaaaaagatattaatttaccgcag 
tattacaacaatagggagttaagttggctagattttaactacagagtattacaagaatca 
tatgataaaaataatccgttgcttgaaaaacttaactttatttctatcttcagttcaaat 
ttagatgaattctttatggttcgagtggctgggttaaaagaccaagtcaaaatgggatat 

40 gacaaacctgaaaataaagcacagatgacgcctcaagaacaacttgatgctattaaaatt 
aaaaatactgactatgtgaacactcaatatcaacgttataacgaattaattaaagaatta 
gccaattacgatattgagatggtaaaacctgaagacttatcagatgcattgatagaaaaa 
ttagaaaaagagttcaagttaagtgtcttgccgacactcactccgttaggtattgatgcg 
tatcatccatttccaaagttaaataataaaagtttaaatatttttgttgatatcgatacg 

45 gaagatgccattaattcagctatcgttcaaattccttcattaattccacgctttttaact 
ttaaatgagggtacaaaacaatacgttgtcatggtagaagatgtaattacgtatttcatc 
aattatctatttacaggatacgaagtactaaatacttttactttccgaataacacgtaat 
gcagatttaaccattcatgaagatggcgctgaggacttgcttatagaaattgaacgtttc 
ttgaaagaacgtaagagtggttcggctgtacgtttagaattagattgtcgcacttctgaa 

50 aaagagaatgtagaatggttaatcgatcaattagaaattgaagataatgatatttattat 
ttagatggtccacttgatttaacattcttatttggattggttgatcatctatctcataag 
ctcaaatatttaacgtatgagaaatatacccctcaaccacctagatcattaggcaacaag 
aat. atctatcaattatcattagaaagagatatattcttccaccatccgtatgaatcattt 
gaaccaatagttgactttattcgacaagcagcagatgacccaaatacaatcgctatcaaa 

55 caaaccttgtatcgagtgagtaaggattcgccgattattaacagcttaaaagaagccgct 
gaaaacggcaagcaagtaacggtgctcgtagaattaaaagcacgctttgatgaagaaaat 
aacgtacattgggcacgtatgttagaagatgctggctgtcacgttatttatggtatgaca 
catctaaaaacgcatagtaaaattgcgctagtcgttaaacgcatcaacaataaacttacg 
tcatttgttcatttaggcacaggtaactataacgataaaactgctaaattatacacagat 
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atgggtatcatcacgacaaataaagatatcgctgaggatgcaattaacttctttaattac 
ttgagtggttactcaacaaaaccagaatacaataaattgattgtagcaccatacgatatt 
cgagacgtcttcattgatcgtatcgataaagaaatacgtagtcatttacaacatggtaac 
ggtaaaattatgatgaaaatgaactctttaaccgataaaacgattatcgaaaagctcttc 
5 gaagcatcccaagcaggcgttaaaatacaactcatcattcgtggtatatgttgtcttaaa 
ccaggcattccaggtattagcgaaaatatagaggttgttagtattgtgggtcgtti:gctt 
gaacattcacgtatttactacttccataataatggtgaggcgcatatttatttatcttca 
gctgacgttatgacacgtaatatgattaaacgtgtcgaaatattattccctgttgaagat 
aaatcaataggacaacgattagttaactatatgaatttacaattatctgacaaccaaaaa 
10 ggtcgttaccaagatgcacaaggcgtttatcattatgtcgaaaacaattcatctccttta 
aactctcaatcttacttgatgcaagaagcaattaagtatggagaagaactaaaaaaacaa 
tcggtacaaccttctggacaacccgttcattctagacgtggtggtagttggatacgaaaa 
ttaaaaagtacatttaaaagataa 

15 Sequence 2862 

MRCMYSMQTRLGEKDINLPQYYNNRELSWLDFNYRVLQESYDKNNPLLEKLNFISIFSSN 
LDEFFMVRVAGLKDQVKMGYDKPENKAQMTPQEQLDAIKIKNTDYVNTQYQRYNELIKEL 
ANYDIEMVKPEDLSDALIEKLEKEFKLSVLPTLTPLGIDAYHPFPKLNNKSLNIFVDIDT 
EDAINSAIVQIPSLIPRFLTLNEGTKQYVVMVBDVITYFINYLFTGYEVLNTFTFRITRN 

20 ADLTIHEDGAEDLLIEIERFLKERKSGSAVRLELDCRTSEKENVEWLIDQLEIEDNDIYY 
LDGPLDLTFLFGLVDHLSHKLKYLTYEKYTPQPPRSLGNKNIYQLSLERDIFFHHPYESF 
EPIVDFIRQAADDPNTIAIKQTLYRVSKDSPIINSLKEAAENGKQVTVLVELKARFDEEN 
NVHWARMLEDAGCHVIYGMTHLKTHSKIALWKRINNKLTSFVHLGTGNYNDKTAKLYTD 
MGIITTNKDIAEDAINFFNYLSGYSTKPEYNKLIVAPYDIRDVFIDRIDKEIRSHLQHGN 

25 GKIMMKMNSLTDKTIIEKLFEASQAGVKIQLIIRGICCLKPGIPGISENIEVVSIVGRLL 
EHSRIYYFHNNGEAHIYLSSADVMTRNMIKRVEILFPVEDKSIGQRLVNYMNLQLSDNQK 
GRYQDAQGVYHYVENNSSPLNSQSYLMQEAIKYGEELKKQSVQPSGQPVHSRRGGSWIRK 
LKSTFKR* 

30 Sequence 2863 

Contig_04 7 3_pos_54 21_6053 

is similar to (with p-vaiue 8,0e-31) 

>gp:gp|D64024 |D64024_2 Sulfolobus sp. DNA for 2-oxoacid : f er 
redoxin oxidoreductase subunit alpha and beta, complete cds, 

35 NID: gl565182. 

atggcaaacaaagatttaacagttatcgcttctggtggtgatggagacggctatgcaata 
ggaatgggacatactattcatgctcttagacgtaatatgaatatgacgtatattgtcatg 
gacaatcaaatatatggattaactaaaggacaaacatcaccttcctcagctaaaggattt 
gtaactaaatcaacacctaaaggaaatatagaaaagaatgtagctccattggaattggca 

40 ctgtcctctggtgcaacttttgtagcacaaggattctcaagtgatataaaggcattaact 
aaaatgattgaagatgcgattcatcatgatggtttttcttttgttaatgttttctcacct 
tgtgttacttacaataaagtgaatacttatgactggtttaaagaacatttaacaagtatc 
gatgatattgagggctatgacatcacagataaacaacttgctatgaaaactgtgctggat 
catgagtcactggttaaaggtatcgtttatcaagatacaacaacaccttcttatgaatcg 

45 caaatttcagaactagaacatgaggcgttagctaaaagagatattcatattacagaagaa 
actttcaacgatttaactgcacaatttttataa 

Sequence 2864 

MTVNKDLTVIASGGDGDGYAIGMGHTIHALRRNMNMTYIVMDNQIYGLTKGQTSPSSAKGF 
50 VTKSTPKGNIEKNVAPLELALSSGATFVAQGFSSDIKALTKMIEDAIHHDGFSFVNVFSP 
CVTYNECVNTYDWFKEHLTSIDDIEGYDITDKQLAMKTVLDHESLVKGIVYQDTTTPSYES 
QISELEHEALAKRDIHITEETFNDLTAQFL* 

Sequence 28 65 
55 Con t ig_0 4 7 6_pos_l 12 1 9_1 2 2 6 5 

is similar to (with p-value 3.0e-44) 

>sp:sp|P36649l YACK_ECOLI PROBABLE 53.4 KD BLUE-COPPER PROTE 
IN YACQ PRECURSOR. >gp : gp 1 AE000121 | AE000121_8 Escherichia co 
li K-12 MG1655 section 11 of 400 of the complete genome. NID 
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: gl786306. 

atgtataataaagtttttgcaattttaattataattttttccataataattattgcgtct 
aatgatactttcgcagaaagtaagaatgatatgatgaatatgaaagaagataagaaaaat 
acaatggatatgacaaatatgaaacatcatgacgaaagaaagaaattaaattcttcacaa 
5 ggaaaaaatgaaataatatttcctaaagttgcagagtcaaaaaaagataacaatggttat 
aaaaattatacattaaaagctcaggaaggaaagacagagttttacaaaaataatttttct 
aatactctaggctacaatggaaatttacttggaccaactttaaaattaaaaaaaggagat 
aaagttaaaattaagttaataaataacttagatgaaaatacaacatttcattggcatgga 
ttagaagtaaatggaaaagtggatggagggccttctcaggttataaaaccaggaaaagaa 

10 aaaactataaaatttgaggttaatcaagattctgctacgttatggtatcacccccacccc 
tctccaaatacagctaaacaagtttataatggcttatcaggattattatatatagaagat 
agtaaaaagaataattatcctagtgattatggaaaaaatgatttgcctataataatccaa 
gataaaacatttgtatctaaaaaattaaattattcaaaaacgaaagacgaagatggcact 
caaggtgatactgttcttgtgaacggaatagtaaaccccaaactgacaacaaaagaagag 

15 aaaatacgtttgagacttttaaatggttctaatgctcgagatttaaatcttaagctaagt 
aataatcaaagttttgagtatattgcttcagatggcggtcaattaaaaaacgctaaaaaa 
ttaac?ayoaattaatttagctccttcagaaagaaaagaaatagtaatagatttat-: taaa 
atgaaaggcgagaaaatcagtctggttgataatgataaaactgtaattttaccgattagt 
aaacaactggtaagtgtaagatattag 

20 

Sequence 2866 

MYNKVFAILIIIFSIIIIASNDTFAESKNDMMNMKEDKKNTMDMTNMKHHDERKKLNSSQ 
GKNEIIFPKVAESKKDNNGYKNYTLKAQEGKTEFYKNNFSNTLGYNGNLLGPTLKLKKGD 
KVKIKLINNLDENTTFHWHGLEVNGKVDGGPSQVIKPGKEKTIKFEVNQDSATLWYHPHP 
25 SPNTAKQVYNGLSGLLYIEDSKKNNYPSDYGKNDLPIIIQDKTFVSKKLNYSKTKDEDGT 
QGDTVLVNGIVNPKLTTKEEKIRLRLLNGSNARDLNLKLSNNQSFEYIASDGGQLKNAKK 
LKEINLAPSERKEIVIDLSKMKGEKISLVDNDKTVILPISKQLVSVRY* 

Sequence 2867 
30 Con t ig_0 4 7 6_pos_64 2 9_54 1 3 

is similar to (with p-value 6.0e-55) 

>gp:gp|X97452|ECPM_12 E.coli paa cluster for phenylacetic 
acid degradation. NID: g2764820. 

atgtctcttttggatgacgtcattttaggtaatacggtaggtaatggggggaatttagct 

35 agaaaatcattacttgaagcgggattagattttaaaatacctggtataacaattgatcgt 
caatgtggctcaggtcttgaagccgttatacaagcctgtaggatggtacaaagtgqtgct 
ggaacai't'atatattgcaggtggtgttgagagtaccagtagagcaccttggaaaat.caaa 
cgtccgcagtcagtttatgaatctgagtttccacaattttttgaacgggcgccttttgca 
agagaaggagaagacccttcaatgattgaagcagccgaaaatgtagcgaagaaatatcat 

40 atcagtagaaatgaacaagatgactttgcgtatcgcagtcatcagttggcatcaaaaaat 
atgaataacggtaatatttcccaagaaattttaccgttcaaagtgaaaggtgaatatttt 
aatcaagatgaaagtattaaacctcaacttactctcagaacacttggcagacttaaacca 
cttttaaatgaaggaacagtcacagtaggaaatagttgtaagaaaaatgatggtgcagta 
ttactgattgttatggaagaaaatcgggcacgtcaattaggattcacagaagggattaag 

45 tttgtgaatagtgcaactgtaggtgttcaaccacagtatttaggagtaggtccagtgcca 
gcagtaaatcaattattagctcaagaacgattaactataaatgatataaatgcagtagaa 
ttaaatgaagcatttagctctcaagttattgcgagccaacaacagcttaacattcctttg 
aataagttgaattgttggggaggagcaattgctacagggcatccatatggtgcaagtgga 
gcagcgttagtcacacgtttattttatatgaaacatcaatttagaactatagcaactatg 

50 ggaataggtggagggataggaaatgcagctttatttgaaagatggtatggaaattag 

Sequence 2868 

MSLLDDVILGNTVGNGGNLARKSLLEAGLDFKIPGITIDRQCGSGLEAVIQACRMVQSGA 
GTIYIAGGVESTSRAPWKIKRPQSVYESEFPQFFERAPFAREGEDPSMIEAAENVAKKYH 
55 ISRNEQDDFAYRSHQLASKNMNNGNISQEILPFKVKGEYFNQDESIKPQLTLRTLGRLKP 
LLl!SGTV7VGNSCKKNDGAVLLIVMEENRARQLGFTEGIKFVNSATVGVQPQYLGVoPVP 
AVNQLLAQERLTINDINAVELNEAFSSQVIASQQQLNIPLNKLNCWGGAIATGHPlGASG 
AALVTRLFYMKHQFRTIATMGIGGGIGNAALFERWYGN* 
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Sequence 2869 
Con t i g_0 4 7 7_pos_2 5 4_7 2 7 
is similar to (with p-value 8.0e-35) 
>gp:gplU96107 |SCU96107_3 Staphylococcus carnosus N5,N10-met 
5 hylenetetrahydromethanopterin reductase homolog, SceB precur 
sor (sceB) and putative transmembrane protein genes, complet 
e cds, and putative Na+/H+ antiporter NhaC (nhaC) gene, part 
ia.l cds. NID: g2735503. 

atgaaaaaaatcaaaacaatctcgacattggtagctggacttggtatagcatttctaggt 
10 cacacaacacatgcagatgcggctgaaaataacaatcaacaacaaagtacatataactat 
agtacaactgaagtatcattttctaattcaggaaatttatatacttctggccaatgtact 
tggtatgtttatgataaaactggtggaaaaatcggatcaacatgggggaatgcaaatagc 
tgggcaactgcagctcaagcagcaggattcactgtaaataatacacctgaagaaggtgca 
attatgcaatcatctgaaggtgctttcggacatgttgctttcgttgaaagtgtcaataat 
15 gatggttctattactgtatcagaaatgaactatgatggtggtccattcgctataagcaca 
cgaacaatctctgccagtgaagcaagttcatataattacatccacctgaattaa 

Sequence 2870 

MKKIKTISTLVAGLGIAFLGHTTHADAAENNNQQQSTYNYSTTEVSFSNSGNLYTSGQCT 
20 WYVYDKTGGKIGSTWGMANSWATAAQAAGFTVNNTPEEGAIMQSSEGAFGHVAFVESVNN 
DGSITVSEMNYDGGPFAISTRTISASEASSYNYIHLN* 

Sequence 2871 

Con t i g_0 4 7 8_pos_6 9 9 8_6 654 

25 is similar to (with p-value l.Oe-60) 

>pir: pir 1 167760 1 167760 transposase (insertion sequence ISIO 
) - Escherichia coii >gp: gpj S67119 I S67119_2 BST=somatotropin 
. . . aST/beta-Gai fusion protein [Escherichia coli, LBB84 pla 
smid pXT107, ISlOL/R-1, PlasmidTransposonlnsertionMutant , 3 

30 genes, 1679 nt] . NID: g455674. 

atgcagattgaagaaaccttccgagacttgaaaagtcctgcctacggactaggcctacgc 
catagccgaacgagcagctcagagcgttttgatatcatgctgctaatcgccctgatgctt 
caactaacatgttggcttgcgggcgt teat get cagaaacaaggttgggacaagcacttc 
caggctaacacagtcagaaatcgaaacgtactctcaacagttcgcttaggcatggaagtt 

35 ttgcggcattctggctacacaataacaagggaagacttactcgtggctgcaaccctacta 
gctcaaaatttattcacacatggttacgctttggggaaattatga 

Sequence 2872 

MQIEETFRDLKSPAYGLGLRHSRTSSSERFDIMLLIALMLQLTCWLAGVHAQKQGWDKHF 
40 QANTVRNRNVLSTVRLGMEVLRHSGYTITREDLLVAATLLAQNLFTHGYALGKL* 

Sequence 2873 

Contig_04 80_pos_5372_6550 

is similar to (with p-value 2.0e-24) 

45 >sp:sp|P23524|YHAD_ECOLI HYPOTHETICAL 42.1 KD PROTEIN IN RN 
PB-SOHA INTERGENIC REGION (ORF 3) (F408). >pir : pir I JQ0614 |JQ 
0614 hypothetical 42K protein - Escherichia coli >gp:gp!D902 
12 i eC0RNPBW_3 E.coli rnpB gene and ORFs . NID: g216630. >3p:g 
PIU18997 |ECOUW67_54 Escherichia coli K-12 chromosomal region 

50 from 67.4 to 76.0 minutes. NID: g606010. >gp: gp | AE000394 1 AE 
000394_2 Escherichia coli K-12 MG1655 section 284 of 400 of 
the complete genome. NID: g2367197. 

atgtttaaaataatttttggaaaagagaaaaataaggtggttaagacaatgaaagtttta 
gtagccatggatgaatttaatggaattatttctagttaccaagctaatagatatgttgaa 
55 gaagcggtagcaagtcaaattgaagatgcagatatcgttcaagttccactatttaacggt 
cgtcacgaattattagattcagtctttctttggcaatcaggaaataaatatcgtgtgagt 
gcgcatgatgctgacatgaaagaaaccgaagcaatatatggacaaacggatagtggtatg 
actattatcgaaggtcacttatttttaaatggcaaaaaacctattcaacatcgatcaagt 
tacggtttgggagaggttataaaagcagcattggacaatcatacagaacatcttgttatt 
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tctttaggtggaataggaagttttgatggcggtgcaggcatgttgcaagcattgggtgca 
acattttatgatgatgaagcacaaattgtcgatatgaggaaaggtgcatatttaataaaa 
tatattagacgtattgatttatcaggtgttcatccacaattaacaaaggtaaacattcaa 
ttaatgtcagatttctcaagtcgattgtatgggaaaaaaagtgaaatcatgcaaacatac 
5 gaatcattagatttgtctcaaaatgaagcagccgagatagataatttaatttggtatttt 
agtgaattatttaagaatgaattgaaaatagcaatgggaccaatcgagcgcggtggtgct 
ggaggtggtatagcagctgtattaaatagtctctatcaagctgagattttaacaagccat 
gaattagtgaatcaaatcacacatttagaaaacttaattcaacaggcagatcttattatt 
ttcggagaaggtttgaaagaagaagatcaaattctagagactacaacaatacgtatagca 
10 gaacttacccaacaatacagcaagccagctattgcaatttgtgctacaaatgataaattt 
gatttgtttgaatcattgaatgttacagcaatgtttaatacatttattgatatgcctgat 
tcatatacagattttaagatgggtattcaaatcagacattacacagtacaagcactaaaa 
ttattgaaaacgcaaataaacttaccgctttcatcctaa 

15 Sequence 2874 

MFKIIFGKEKNKVVKTMKVLVAMDEFNGIISSYQANRYVEEAVASQIEDADIVQVPLFNG 
RHELLDSVFLWQSGNKYRVSAHDADMKETEAIYGQTDSGMTIIEGHLFLNGKKPIQHRSS 
YGLGEVIKAALDNHTEHLVISLGGIGSEDGGAGMLQALGATFYDDEAQIVDMRKGAYLIK 
YIRRIDLSGVHPQLTKVNIQLMSDFSSRLYGKKSEIMQTYESLDLSQNEAAEIDNLIWYF 

20 SELFKNi;!.KIAMGPIERGGAGGGIAAVLNSLYQAEILTSHELVNQITHLENLIQQ:^DLri 
FGEGLKEEDQILETTTIRIAELTQQYSKPAIAICATNDKFDLFESLNVTAMFNTFIDMPD 
SYTDFKMGIQIRHYTVQALKLLKTQINLPLSS* 

Sequence 2875 

25 Cont i g_0 4 8 3_pos_5 289 6392 

>pir:pir IG64047 jG64047 cystathionine gamma-synthase (metB) 
homolog - Haemophilus influenzae (strain Rd KW20) >gp:gptU32 
694|U32694_2 Haemophilus influenzae Rd section 9 of 163 of t 
he complete genome. NID: gl573035. 

30 atgaaggatacagatttagctcaaattgctttaacacaagatcacactggcgcaattgcc 
aatccaatatatttatctactgcatatcagcatcctcacctaggtgaatcaacaggctat 
gattatacacgaactaaaaatccaacaagaacagcctttgaagaagcttttgcacaactt 
gaaaaaggcattgcttcatttgctacttccagtggtatggcggcgattcagttaatatgt 
aatatattcaaaccaggtgatgaaattctcgttgcatttgacctatatggtggaacattt 

35 cggttattcgatttttacgaaaaacaatatggtttgaagtttaaatatgtagacttttta 
aattatgaagaagtggaaaaaaacatcactccacaaactagagcattatttattgaacca 
atct caaatccacaaatgattgaaattgatgtagaaccatattatatccttagcaaaaaa 
catcaactattaacaattatcgacaacacttttttaacaccttatctttcgacaccactc 
gae-.gaagntgcagatatcgttctgcattcagcaacaaaatatattggcggacataacgat 

40 gtgttagctggagttgtaactgttaaggatgctcaattagctgaacaattgaatcaattc 
cataatatgattggagcaactctatcacctcttgatagttatcttttacaaagaggtcta 
aagacattacatcttcgcatagagcgttcccaagaaaatgctcaaaaacttgcacaacga 
tgtcgccagtcagattcaattgatgaagttttatatagtggacgaacaggcatgcttagt 
ttaagactgaaccaagcatatagcgtcgctaaatttttagaaaatttagaaatttgtata 

45 tttgcagaaagcttaggtggtacagaaacatttatcaccttcccttatacacaaacacac 
gttga tat gccagatgaggaaaaagacaaacgtggaattgatgaat atct catcagattg 
tccgtaggtatagaagactataacgatatagaagctgacataattcaagcattagagaat 
tctaaagtaggagtgatttcatga 

50 Sequence 287 6 

MKDTDLAQIALTQDHTGAIANPIYLSTAYQHPHLGESTGYDYTRTKNPTRTAFEEAFAQL 
EKGIASFATSSGMAAIQLICNIFKPGDEILVAFDLYGGTFRLFDFYEKQYGLKFKYVDFL 
NYEEVEKNITPQTRALFIEPISNPQMIEIDVEPYYILSKKHQLLTIIDNTFLTPYLSTPL 
EEGADIVLHSATKYIGGHNDVLAGWTVKDAQLAEQLNQFHNMIGATLSPLDSYLLQRGL 

55 KTLHLRIERSQENAQKLAQRCRQSDSIDEVLYSGRTGMLSLRLNQAYSVAKFLENLEICI 
FAESLGGTETFITFPYTQTHVDMPDEEKDKRGIDEYLIRLSVGIEDYNDIEADIIQALEN 
SKVGVIS* 

Secj-.jft:!icfc 2877 
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Contig_0484_pos_4 823^4 407 

>sp:sp 1 069282 )MQO_CORGL MALATEr QUI NONE OXIDOREDUCTASE (EC 1 
.1.99.16) {MALATE DEHYDROGENASE (ACCEPTOR)) (MQO) . 
atgataggtactatgattgaaacgcctagagcatgcttgattgcgaatgaccttgcgaaa 
5 cattgtgatttcttcagttttggtactaatgatttaacgcaattgacatttggtttctct 
agagatgatgcaggaaaattcataaatgtgtatactgaaaataacattttacagcttgac 
ccattccaaactttagatagagaaggtgtaggacgactaattcaattagctgttgaacaa 
gctaaaaatacaaatccagagataaaaattggtgtatgtggtgagcttggtggcgatgca 
aaatcaattcgtaaatttaaccaatgggaaattgattatgtttcatgttcaccatttaga 
10 gttccgggtgcaattttagctacagctcagagtcaagcggaggaaagcgagcgataa 

Sequence 2878 

MIGTMIETPRACLIANDLAKHCDFFSFGTNDLTQLTFGFSRDDAGKFINVYTENNILQLD 
PFQTLDREGVGRLIQLAVEQAKNTNPEIKIGVCGELGGDAKSIRKFNQWEIDYVSCSPFR 
15 VPGAILATAQSQAEESER* 

Sequence 2879 
Contig_04 84_pos_34 08_1879 
is similar to (with p-value 6.0e-42) 
20 >pir:pir|S53297|S53297 pyruvate, orthophosphate dikinase (EC 
2.7.9.1) - Flaveria pringlei >gp:gp 1X75516 | FPPDK^l F.pringl 
ei mRNA for pyruvate, orthophosphate di)cinase. NID: g577775. 

atgataactactacacaagaggtgaacattatggctatgtctgacaaaaaagacgtcgtg 

25 ttaatcggtgctggtgtactaagtactacatttggttctatgttgaaaacgattgcacct 
gattgggacattcatttatatgaacgtctagatcgtcctggtattgaaagttcaaatgaa 
cgtaacaatgcaggaacaggacatgcagctttatgtgaattgaactatactgtacaacaa 
cctgatggttcaattgatattgaaaaagctaaagaaattaatgaacaatttgaaatttct 
aaacaattctggggtcatttagttaaatcaggagaaattcaaaatcctaaagaatttatt 

30 aatccattacctcatattagttttgttcgtggtaaaaataacgttaaattcttaaaagat 
cgutatgaagcgatgaagcaattccctatgttcgataatatcgaatatactgaagHtatt 
gaagaaatgagaaaatggattccattaatgatgaaaggccgtgaagataagggctacatg 
gcagcgagtaaaatagacgaaggaactgacgtaaactacggtgaattaactcgtaaaatg 
gctcaaaatcttaaaaactcaccaaacgttgaagtgcaatacaaacatgaagttgttgat 

35 tttgaacgtttgtctaatggtaaatggtcagttaaaattaaaaatctaaataatggacaa 
gtattcgaacatcaaactgattatgtgtttatcggtgctggtggtggcgcaattccacta 
ttacaaaaaactggcattccagaaagtaaacatttaggtggattcccaatcagtggtcaa 
tttattgcttgtacaaatccgcaagttattgaacaacacgatgccaaagtttatggtaaa 
gaaccacctggtacaccaccaatgacggtacctcacttagatacacgttacattgatggt 

40 gaaagaacattattatttggaccatttgcaaacgtgggacctaaattcctaaaacatggt 
tctaacttggatttattcaaatcaattaaaccatataacattacgactttacttgcttct 
gcagttaaaaacttaccattaattaagtattcatttgaccaagtcatcatgacaaaagaa 
ggttgtatgaaccatttacgtacgttctatcctgaagcacgtgatgaagattggcaagtt 
tatacagctggtaaacgtgtacaagttattaaagatactgaagagaatggtaaaggattt 

45 atccaatttggtactgaagtggttaattctgaagaccactcagttattgcattactaggc 
gaatcaccaggagcatcaacttcagtatcagtagcactagaagttttagagaaaaacttc 
cctgaatacgcgaaagactgggaacctaaaatcaagaaaatgattccttcatatggtgaa 
tcacttattgatgatgttcaattaatgagaaaaatacgtaaacaaacatctaaagatctt 
gaattaggattctataataaagcaaaataa 

50 

Sequence 2880 

MITTTQEVNIMAMSDKKDVVLIGAGVLSTTFGSMLKTIAPDWDIHLYERLDRPGIESSNE 
RNNAGTGHAALCELNYTVQQPDGSIDIEKAKEINEQFEISKQFWGHLVKSGEIQNPKEFI 
NPLPHISFVRGKNNVKFLKDRYEAMKQFPMFDNIEYTEDIEEMRKWIPLMMKGREDKGYM 
55 AASKIDEGTDVNYGELTRKMAQNLKNSPNVEVQYKHEWDFERLSNGKWSVKIKNLNNGQ 
VFEHQTDYVFIGAGGGAIPLLQKTGIPESKHLGGFPISGQFIACTNPQVIEQHDAKVYGK 
EPPGTPPMTVPHLDTRYIDGERTLLFGPFANVGPKFLKHGSNLDLFKSIKPYNITTLLAS 
AVKNLPLIKYSFDQVIMTKEGCMNHLRTFYPEARDEDWQVYTAGKRVQVIKDTEENGKGF 
IQFGTEWNSEDHSVIALLGESPGASTSVSVALEVLEKNFPEYAKDWEPKIKKMIPSYGE 
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SLIDDVQLMRKIRKQTSKDLELGFYNKAK* 

Sequence 2881 
Con t ig_0 4 8 6_pos_l 0 7 3_1 840 
5 is similar to (with p-value 9.0e-58) 

>gp:gp|U38892 I SSU38892_2 Synechocystis sp. ruvB gene, compl 
ete cds, and secA gene, partial cds. NID; gl256587, 
atgtttaaaataggaaatttagaattacaatctcgtttacttttaggtactggaaaattt 
gaaaatgaagaggttcagtcaaaagcaattgaggcatctgaaacaaatgtacttacattt 

10 gcagtaagacgtatgaatttatatgatcgtaacctacctaacccacttgcaaacgttaat 
ttaaaagattttatcacttttccaaatactgcaggtgccaaaacagctcaagaagctatc 
agaattgctgaaattgctaatcacgcaggtgtatgtgacatgattaaagtcgaagtcatt 
ggtgatgacgaaacattattacctgatccattcgaaacatacgaggcatgcaaagtattg 
ttagaaaaaggttacactgtttgtccttacatctctaacgatttagttttagctcaacgt 

15 ttagaagaattgggtgtacacgcagttatgccacttgcatcccctattggtacaggaaga 
ggtattaataacccattaaatttaagttatattatcgaaaatgctagtgtacctgtaatc 
gtagatgctggtattggttcccctaaagatgcgtgtcatgccatggagcttggcgcagat 
ggtattttactcaacacagccatttcagcggcaaaagatcctgtgaaaatggctgaagca 
atgaaattaggtataaatgctggcagactttcatatgaagctggacgcattcctgttaag 

20 tatactgcacaagcatctagtccatcagaaggtttagggttcttgtaa 

Sequence 2882 

MFKIGNLELQSRLLLGTGKFENEEVQSKAIEASETNVLTFAVRRMNLYDRNLPNPLANVN 

LKDFITFPNTAGAKTAQEAIRIAEIANHAGVCDMIECVEVIGDDETLLPDPFETYEACKVL 
25 LEKGYTVCPYISNDLVLAQRLEELGVHAVMPLASPIGTGRGINNPLNLSYIIENASVPVI 
VDAGIGSPKDACHAMELGADGILLNTAISAAKDPVKMAEAMKLGINAGRLSYEAGRIPVK 
YTAQASSPSEGLGFL* 

Sequence 2883 

30 Contig_0487_pos_1335_2006 . 

is similar to (with p-value 5.0e~34) 
>sp:sp|P3 97 62|AMPS_BACSU AM I NOPE PT I DAS E AMPS (EC 3.4.11.-). 
>gp:gp|AF012285|AF012285_21 Bacillus subtilis mobA-nprE gen 
e region. NID: g3282109. >gp: gp( Z99111 | BSUB0008_117 Bacillus 

35 subtilis complete genome (section 8 of 21) : from 1394791 to 
1603020. NID: g2633699. 
atgacgaattatcataataagttaaaacaatatgcagaattattagtaagagtgggaatg 
aatgtacaaccacagcaacctgtttttatacgttcatctgttgaagcgttagaattaact 
catttaatcgtcgaggaagcatataaagcaggggcagaagatgttcgagtgagctacaca 

40 gacccgaaattaaaaagattaaaatttgaaaacgaatcagttgaacactttgaaaaacaa 
gaactcaaacaatatgatattgaagagcgtctggattatgttaatcgtggcgcagcgaac 
ttygcgctcattgctgaagatccagagctattaaatggaatagatgcgcaaaagttaaaa 
gcgtatcaaactgtatactcaaaaggatttaaaccatatatggaagcaagtcaaaaaaac 
caatttccatgggtagtggctgcgttccctactagggattgggcacgtcgtgtctatcca 

45 gagttggatgttgaatcagcatatattaaattcattgatgaagtatttgatattgttcgt 
gtagatggacaaaatccaattgaaaattgggaaaaacacattaaagatttaagtgttcat 
gctaaacgattacaagagaaaaactatcaagctttacattacatatcagaaaattcatac 
atttggttttga 

50 Sequence 2884 

MTNYHNKLKQYAELLVRVGMNVQPQQPVFIRSSVEALELTHLIVEEAYKAGAEDVRVSYT 
DPKLKRLKFENESVEHFEKQELKQYDIEERLDYVNRGAANLALIAEDPELLNGIDAQKLK 
AYQTVYSKGFKPYMEASQKNQFPWVVAAFPTRDWARRVYPELDVESAYIKFIDEVFDIVR 
VDGQNPIENWEKHIKDLSVHAKRLQEKNYQALHYISENSYIWF* 

55 

Sequence 2885 
Contig_04 87_pos_3511_3086 
is similar to (with p-value 6.0e-23) 
>sp:sp|P14 597|DUT_ORFN2 DEOXYURIDINE 5 ' -TRIPHOSPHATE NUCLEO 
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TIDOHYDROLASE (EC 3.6.1.23) (DUTPASE) (DUTP PYROPHOSPHATASE) 
. >gp:gp|M30023 |0RFPRTPS_1 Orf virus homologue of retroviral 
pseudoprotease gene, complete cds . NID: g332561. 
atgacaaatacattagaaattaaattattatcagaaaacgcgactatgccgaagagagca 
5 aattctacagatagtggattggacttatacgtatcagaaacgattaacattcctgcacac 
gcaactaaagtagttaaaacagatatagcgattaatctgccttatgggtatgaggcgcaa 
gtaagacctagatctggtaaatcacttaaaactaaattgcgtgtagcactaggaacaata 
gaccaaacataccacaaagaaataggtatcatcacagataacataggtaatgaagatatc 
acagtagaaaaaggagaaagattagcgcaattagttgtagcgccagttgtatatcctaca 
10 cccaaacaggttgattggtttgaaaatgaaagcgacagaggtgcatatggaagcacagga 
gaataa 

Sequence 2886 

MTUri.S7.::LLSENATMPKRANSTDSGLDLYVSETINIPAHATKVVKTDIAINLPYGYEAQ 
15 VRPRSGKSLKTKLRVALGTIDQTYHKEIGIITDNIGNEDITVEKGERLAQLVVAPVVYPT 
PKQVDWFENESDRGAYGSTGE* 

Sequence 2887 
Contig_0487_pos_988_515 

20 is similar to (with p-value 2.0e-17) 

>sp:sp| P41893|PPAL_SCHPO LOW MOLECULAR WEIGHT PHOSPHOTYROSI 
NE PROTEIN PHOSPHATASE (EC 3.1.3.48) (LOW MOLECULAR WEIGHT C 
YTOSOLIC ACID PHOSPHATASE) (EC 3.1.3.2) (PTPASE) (SMALL TYRO 
SINE PHOSPHATASE). >pir : pir | A554 4 6 | A5544 6 protein-tyrosine-p 

25 hosphatase (EC 3.1.3.48), low molecular weight - fission yea 
St (Schizosaccharomyces pombe) >gp: gp | L33929 | YSPLMPTP_1 Schi 
zosaccharomyces pombe low Mr protein tyrosine phosphatase mR 
NA, complete cds. NID: g602991. 

gtgatactaatgatacatgtagcatttgtatgtctcggtaatatatgtcgttctccaatg 
30 gctgaggctatcatgagacaaagactacaagaaagaggtatttcagatataaaagttcat 
tctagaggaacaggacgttggaatttaggcgaacctccacataacggaacacaaaaaatt 
ctacagaagtaccatattccttatgatggtatggtgagtgaacttttcgaacctgatgat 
gat tttgactatattattgctatggaccaaagtaacgtagacaatatcaaacaaatcaat 
ccaac>ttt.acaaggacaattgttcaaattgctagaatttagtaacatggaagagaqcgat 
35 gtaccagatccatactacacaaataattttgaaggtgttttcgagatggtgcaatcatct 
tgtgataatttaatagactacatcgtaaaagatgcaaatttgaaagagaggtaa 

Sequence 2888 

VILMIHVAFVCLGNICRSPMAEAIMRQRLQERGISDIKVHSRGTGRWNLGEPPHNGTQKI 
40 LQKYHIPYDGMVSELFEPDDDFDYIIAMDQSNVDNIKQINPNLQGQLFKLLEFSNMEESD 
VPDPYYTNNFEGVFEMVQSSCDNLIDYIVKDANLKER* 

Sequence 2889 

Contig__048 8_pos_5255_6256 

45 >sp:sp}P44770|OTC_HAEIN ORNITHINE CARBAMOYLTRANSFERASE (EC 
2.1.3.3) (OTCASE) . >pir : pir | H64079 I H6407 9 ornithine carbamoy 
Itransferase (arcB) homolog - Haemophilus influenzae (strain 
Rd KW20) >gp:gp|U32741|U32741_4 Haemophilus influenzae Rd s 
ection 56 of 163 of the complete genome. NID: gl573582. 

50 atgaaaaatttacgtaacagaagctttttaactttattagacttttcacgacaagaggta 
gaatttttattaacactctccgaagatttgaagcgtgccaaatatatcggcactgaaaag 
cctatgctaaaaaat aaaaatatcgcgcttctttttgaaaaagattccactagaacacgt 
tgr.rjcat^cgaagttgccgcacatgatcaaggtgcacacgtcacttatcttggacc -.aca 
ggttctcaaatgggtaaaaaagaaactgctaaagatacagcacgtgtacttggtgjcatg 

55 tatgatggtattgagtaccgaggtttctctcaacgtactgtagaaacattagcgcaatat 
tcaggtgttccggtatggaatggattaaccgatgaagatcaccctacacaagtgcttgct 
gactttttaactgctaaagaagtattgaaaaaagagtatgctgatatcaactt tact tat 
gttggcgatggacgtaacaatgttgctaacgcattaatgcaaggtgctgccattatgggt 
atgaatttccatcttgtttgtcctaaagaactcaatccgacagaagaattattaaatcgt 
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tgcgaacgtattgcgacggaaaatggcggtaacattttaataacagatgatattgataaa 
ggcgtgaaagattctgatgttatttatacagatgtttgggtatcaatgggcgaacctgat 
gaagtatggcaagaacgccttaaacttttaaaaccatatcaagttaaccaagcat tatta 
gaaaaaacaggcaatccaaatgttatttttgaacattgtttaccttctttccacaatgca 
5 gaaactaaaattggtcaacaaatttatgaaaaatatggcattagtgaaatggaagtcact 
gatgatgtcttcgaaagcaaagcttctgtagtattccaagaagctgagaatagaatgcat 
acaattaaagcggtcatggtagcaactttaggagaattctaa 

Sequence 2890 

10 MKNLRNRSFLTLLDFSRQEVEFLLTLSEDLKRAKYIGTEKPMLKNKNIALLFEKDSTRTR 
CAFEVAAHDQGAHVTYLGPTGSQMGKKETAKDTARVLGGMYDGIEYRGFSQRTVETLAQY 
SGVPVWNGLTDEDHPTQVLADFLTAKEVLKKEYADINFTYVGDGRNNVANALMQGAAIMG 
MNFHLVCPKELNPTEELLNRCERIATENGGNILITDDIDKGVKDSDVIYTDVWVSMGEPD 
EVWQERLKLLKPYQVNQALLEKTGNPNVIFEHCLPSFHNAETKIGQQIYEKYGISEMEVT 

15 DDVFESKASVVFQEAENRMHTIKAVMVATLGEF* 

Sequence 2891 

Cont ig_04 8 8_pos_627 0_7 211 

is similar to (with p-vaiue 3.0e-76) 

20 >sp:sp|Q4 6807 |ARCL_ECOLI CARBAMATE KINASE-LIKE PROTEIN 1. > 
gp:gp|U28375|ECU28375_24 Escherichia coli K-12 genome; appro 
ximately 64 to 65 minutes. NID: g887800. >gp:gp | AE000370 | AEO 
00370_9 Escherichia coli K-12 MG1655 section 260 of 400 of t 
he complete genome. NID: g2367170. 

25 gtgagtgaaatggctaaaattgtagtagctttaggtggaaacgctttaggaaaatc:acca 
caagaacaacttgaattagtaaaaaatacagctaaatccctagtaggattaattactaaa 
ggtcacgaaattgtgattagtcacggtaatggaccacaagtaggaagtattaaccttggt 
ctgaattatgcagctgaacacgatcaaggtcctgcttttccatttgctgaatgtggcgct 
atgagtcaagcctacatcggctatcaacttcaagaaagtttacaaaatgaacttcattca 

30 atgggcatagataagcaagttgtcacactagttacccaagtagaagttgatgaaggcgat 
ccagcttttaatagtccaagtaaacccatcggtctgttctacactaaagaagaagcaaat 
cgtattcaacaggaaaaaggttatcaatttgtagaagatgctggtcgaggttaccgtcgc 
gttgtaccatcaccacaaccaatatctattatcgaactggaaagtattaaaactctagta 
gaaaatgacacactcgtcatcgctgcaggtggaggtggtataccagtcattcgcgaacag 

35 catgatagctttaaaggtatagatgccgtcatcgataaagacaaaacaagtgcat tatta 
ggtgctgatattcactgtgatcaactcattattttaacagcgattgattatgtttatatc 
aactatcatactgaccaacaacaagcacttaaaacaacaaatatagatacgcttaaaaca 
tatattgaagaagaacaatttgccaaaggcagcatgctacctaaaatcgaa tctgccat c 
tcctttattgaaaataatcctaacggtagcgtgctcatcacatcattaaatcaattagat 

40 gcagcactagaaggtaaaattggcacactcattacaaagtaa 

Sequence 2892 

VSEtdAKIVVALGGNALGKSPQEQLELVKNTAKSLVGLITKGHEIVISHGNGPQVGSINLG 
LNYAAEHDQGPAFPFAECGAMSQAYIGYQLQESLQNELHSMGIDKQVVTLVTQVEVDEGD 
45 PAFNSPSKPIGLFYTKEEANRIQQEKGYQFVEDAGRGYRRVVPSPQPISIIELESIKTLV 
ENDTLVIAAGGGGIPVIREQHDSFKGIDAVIDKDKTSALLGADIHCDQLIILTAIDYVYI 
NYHTDQQQALKTTNIDTLKTYIEEEQFAKGSMLPKIESAISFIENNPNGSVLITSLNQLD 
AALEGKIGTLITK* 

50 Sequence 2893 

Contig_0489_pos_5066_5410 

is similar to (with p-value 5.0e-39) 

>sp:sp| P37 941 |ODBB_BACSU 2-OXOISOVALERATE DEHYDROGENASE BET 
A SUBUNIT (EC 1.2.4.4) (BRANCHED- CHAIN ALPHA-KETO ACID DEHY 

55 DROGENASE COMPONENT BETA CHAIN (El)) (BCKDH El-BETA) . >pir:p 
ir !S32487(S32487 3-methyl-2-oxobutanoate dehydrogenase (lipo 
amide) (EC 1.2,4.4) El beta chain - Bacillus subtilis >gp:gp 
|M97391|BACBRANCH_2 Bacillus subtilis branched chain alpha-k 
eto acid dehydrogenase El-alpha, branched chain alpha-keto a 
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cid dehydrogenase El-beta, and branched chain alpha-keto aci 
d dehydrogenase E2, complete cds. NID: gl42610. >gp:gp|D8443 
21 BACJH64 2_24 7 Bacillus subtilis DNA, 283 Kb region containi 
ng skin element. NID: g2627063. >gp:gp| Z99116 |BSUB0013_115 B 
5 acillus subtilis complete genome (section 13 of 21) : from 23 
95261 to 2613730. NID: g2634723. 

gtgaattactgtttacaagctgcagatattttggcaaatgacggcatcgatgttgaagta 

gtcgacttaagaacagtttatccactagataaagcaactatcattgaacgctctcaacgt 
actggtaaagttcttcttgttactgaagataatctagagggaagcattatgtctgaagta 
10 tctgcaattatagctgaaaactgtctgttcgatttagatgcgccaatcatgcgattagct 
gcaccggatgtcccatctatgccattttcaccaacattagaaaatgaaattatgatgaac 
ccagaaaagatacaggacaaaatgcgtgaactcgcacaattttaa 

Sequence 2894 

15 VNYCLQAADILANDGIDVEVVDLRTVyPLDKATIIERSQRTGKVLLVTEDNLEGSIMSEV 
SAIIAENCLFDLDAPIMRLAAPDVPSMPFSPTLENEIMMNPEKIQDKMRELAQF* 

Sequence 2895 

Contig_04 93_pos_24 90_1000 

20 >sp:sp|069282|MQO_CORGL MALATE : QUI NONE OXIDOREDOCTASE (EC 1 
.1.99.16) (MALATE DEHYDROGENASE (ACCEPTOR)) (MQO) . 
atgcacatgagtgaagcaaatcataaaaacatcgttgttgtaggtgcaggaattattggt 
acgtcagtagcgacaatgctttcaaaagtaagtcctaactggcatatcgatatgtttgaa 
agactagaaggcgctggtattgaaagttcaaatgaaaataataatgctgggacaggtcat 

25 gcggcattatgtgaattaaactatacagttgaacaagatgatggttcaattgatgcatct 
aaagcgcaagaaattaatgaacaattcgaattatctagacaattctggggtaatttagtt 
aaaaatggtgatatttctaatcctgaagaatttattcaaccattacctcatatcagtttc 
gttatgggaccaacaaacgttaactttttaagaaaacgttatgaaacactaagaactctt 
ccaatgt toga tacaatcgaa tat acagaagacatggaaacaatgagaaaatggatgcca 

30 ttaatgatggaaaatcgtgaaccaggtcatcaaatggcagcaagtaaaattgatgaaggt 
acagatgtgaactatggtgcgttaacacgtaagttagcacattacttagaacaaaaatct 
aatgtttcattaaaatacaatcatgatgttgtagatttaacacaaagagaagatggcaaa 
tgggaagttgtcgttgaaaatagagaaactaaagaaaaagtaactaaaatagcagataaa 
gtgtttattggtgctggcggtcactctattccgttattacaaaaatctggcgttaaacaa 

35 agagaacacctaggtggtttcccaatcagtggtcaattcttaagatgtacaaacccagat 
attattaaacaacatgcggctaaagtttacagtaaagagcctcaaggtaagccaccaatg 
actgtaccacaccttgatacacgttatatcaatggtaaacaaacattattatttggtcca 
tatgcgaatatcggccctaaattcttgaaattcggttcaaatctagacttattcgaatca 
atcaaaccatataacattactacaatgttggcttcagcagttaaaaatgtacctttaatt 

40 aaatattcaattgatcaaatgatcaaaactaaaqaaggttgtatgaactatttaagaaca 
tttattcctgatgctaaagatgaagattgggaactttacactgctggtaaacgtgttcaa 
gttattaaagatagtgaacaacacgggaaaggtttcgtagtatttggtactgaagttgtc 
aattcagacgacaattctatgattgcattattaggtgaatctccaggggcttcaacatca 
ttatcagttgtattagaagttttagagaaaaacttcgctgatgacaaagaagcatgggaa 

45 cctgttgttaaagaaatggtaccaacatacggtcgttcattaattaatgacgaaaaatta 
atgagagaaacacgtcgcgaaacttctaaaaacttacatttaaatagataa 

Sequence 2896 

MHMSEANHKNIWVGAGIIGTSVATMLSKVSPNWHIDMFERLEGAGIESSNENNNAGTGH 
50 AALCELNYTVEQDDGSIDASKAQEINEQFELSRQFWGNLVKNGDISNPEEFIQPLPHISF 
VMGPTNVNFLRKRYETLRTLPMFDTIEYTEDMETMRKWMPLMMENREPGHQMAASKIDEG 

TDVNYGALTRKLAHYLEQKSNVSLKYNHDVVDLTQREDGKWEWVENRETKEKVTKIADK 
VFIGAGGHSIPLLQKSGVKQREHLGGFPISGQFLRCTNPDIIKQHAAKVYSKEPQGKPPM 
TVPHLDTRYINGKQTLLFGPYANIGPKFLKFGSNLDLFESIKPYNITTMLASAVKNVPLI 
55 KYSIDQMIKTKEGCMNYLRTFIPDAKDEDWELYTAGKRVQVIKDSEQHGKGFWFGTEVV 
NSjDlySKr.ALLGESPGASTSLSVVLEVLEKNFADDKEAWEPVVKEMVPTYGRSLINDEKL 
MRETRRETSKNLHLNR* 

Sequence 2897 
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Contig_04 94_pos_12199_12525 

>gp:gp| AB009866|AB009866_15 Bacteriophage phi PVL proviral 
DNA, complete sequence. NID: g3341907. 

atgaattcagcagtagtagagtcagaactgaattctacaccatcttctttggtattacct 
5 tctaattcagggaatgtgaataatcctttaggtaatcccacatactcgcgtgaaccatct 
tccatagttttcgcaaacataacagctacatatggtggtgtatcgttaccaactgacacg 
atgccgtcctctgatttttctaatccgaacaatgcaactctgtcctctaatggtagtttg 
tggaoaccagcttctacttcaattgttccgttagcaactgccatttctgcaacttggtta 
tcaccatatgccttctcgatgtcttga 

10 

Sequence 2898 

MNSAVVESELNSTPSSLVLPSNSGNVNNPLGNPTYSREPSSIVFANITATYGGVSLPTDT 
MPSSDFSNPNNATLSSNGSLWKPASTSIVPLATAISATWLSPYAFSMS* 

15 Sequence 2899 

Cont ig_04 94_pos_l 6099 14756 

is similar to (with p-value 3.0e-17) 

>gp:gp| AB009866|AB009866_13 Bacteriophage phi PVL proviral 
DNA, complete sequence. NID: g3341907. 

20 atggctaatttagatgagcgcaaaaaagaaatcgctaatctgatttctaaagcgcaagaa 
gcagtcgaaaaaggcgacctcgaaactgctcgtaatttaaaagctgatattgatgctcaa 
aagaaagagtacgaagaactcgaacagctttcaaaagaaattgaagcgtcagcacctaaa 
caagatgaaccacctaaagatgaaggtgcagaagttgaagataacaaagatggtaattct 
ggagaagaatcagagaacaaaccttctgatgatgaaccagaaggaacttcagatgaagaa 

25 aaacctgatgatgcaccaaaaccagatgacaaacctgaagaaacaccagaaacacctact 
attgaaaaagtagaagaaccaacagaagaagaattaaaaaaagaaaaagacaaaaaagaa 
ggagcgaaacgttctatggctaaattaaaccaaaatccagagacaaacgaagaaat i:cta 
gca c r cqr-.acagtacatgaaatcaaaaggggctaaacgtgacaatg ttaaatctg.' tgac 
gttggcgtaactatcccagaggatattaaatatattcctgaaaaagaagttaagacagtc 

30 caagacttatcagaattggtacaaaaaacttcagtatcaactgcaagtgggaaatacccg 
atcttaaaacgtgctaacgctaaattcaacactgttgctgaattagagaaaaaccctgag 
ttagctcgtccggaattcgaaacaatcaattgggaagtagacacttatcgtggatctatt 
ccgatttcacaagaagcattagatgattcagttgctaacttaactgctattgtttctgaa 
aatattaacgaacaaaaaatcaacactttaaatgaacgtattggtgaagttttaaaagca 

35 ttcaatcctactagtgtttctaatgttgacgacttaaaagaaattatcaacgttaaatta 
gatcctggttatgaccgccaaattatctgtactcaaagtttctatcaaaaactagataca 
ttaaaagatggtaacggtcgttatttactacaagacagtatcatcaacactgcaggtaac 
actgtgttaggtatgaatgtaacagttgtgcgtgatgacttgttaggtaaaaatggagat 
gcattagcatttattggtgatgtaaaacgcggtgtgttatttgcagaccgtacagacgtt 

40 tctgttcaatggattgaaaatgaaatctacggtaaatacttaatgggtgctttccgtttc 
gatgtgaaacaggctgataaaaatgctggtttcttcgtaacatttgaagatgcaacagaa 
cctagtggggatctaggagcataa 

Sequence 2900 

45 MANLDERKKEIANLISKAQEAVEKGDLETARNLKADIDAQKKEYEELEQLSKEIEASAPK 
QDEPPKDEGAEVEDNKDGNSGEESENKPSDDEPEGTSDEEKPDDAPKPDDKPEETPETPT 
lEKVEEPTEEELKKEKDKKEGAKRSMAKLNQNPETNEEILAFEQYMKSKGAKRDNVKSDD 
VGVTXPEUIKYIPEKEVKTVQDLSELVQKTSVSTASGKYPILKRANAKFNTVAELEKNPE 
LARPEFETINWEVDTYRGSIPISQEALDDSVANLTAIVSENINEQKINTLNERIGEVLKA 

50 FNPTS VSNVDDLKEI INVKLDPGYDRQI ICTQS FYQKLDTLKDGNGRYLLQDSI INTAGN 
TVLGMNVTVVRDDLLGKNGDALAFIGDVKRGVLFADRTDVSVQWIENEIYGKYLMGAFRF 
DVKQADKNAGFFVTFEDATEPSGDLGA* 

Sequence 2901 
55 Cont ig_0 4 94_pos_l 4 0 69_1 3 67 7 

is similar to (with p-value 5.0e-25) 

>gp:gp|AB009866iAB009866_13 Bacteriophage phi PVL proviral 
DNA, complete sequence. NID: g3341907. 

atgaaaggcgataaagaaataattgcctatttagaaacgaaatacggtaaatctgctatg 
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aaacgcataactgactttgcactaactaaagctggcaataaagttgtaagtattatcaaa 
ggtaatatgaaaagttttgaagatactggagaatcagtagaagaaactacactttcaaag 
ccgatgacgataaaaggagtaaggaccgttaaaattcattggcgaggtcctaaacaacgt 
tatcgtattatccacctaaatgaatatggtcactttgatcgttctggaaagtgggttaat 
5 acagctggtaaaggtgttattgaaaacgctatgcgtgaaggcagagaaacgtatttcaga 
acagtaaaagaagaaatgagaaagagggtgtaa 

Sequence 2902 

MKGDKEIIAYLETKYGKSAMKRITDFALTKAGNKVVSIIKGNMKSFEDTGESVEETTLSK 
10 mTIKGVRTVKIHWRGPKQRYRIIHLNEYGHFDRSGKWVNTAGKGVIENAMREGRETYFR 
TVKEEMRKRV* 

* 

Sequence 2903 

Contig_04 94_pos_12659_12030 

15 No hits found 

atggcagagaaaaactatcgtt cat ttacagggttaacagaattttattataaagt teat 
ggagaaqgtggagttcaaaaagttgctgatccagaacgcattaaatatttacaagaaatt 
tcagtatctaaagatcaagacatcgagaaggcatatggtgataaccaagttgcagaaatg 
gcagttgctaacggaacaattgaagtagaagctggtttccacaaactaccattagaggac 

20 agagttgcattgttcggattagaaaaatcagaggacggcatcgtgtcagttggtaacgat 
acaccaccatatgtagctgttatgtttgcgaaaactatggaagatggttcacgcgagtat 
gtgggattacctaaaggattattcacattccctgaattagaaggtaataccaaagaagat 
ggtgtagaattcagttctgactctactactgctgaattcatgcaagctaaagttaaaggc 
ttcgaagaagaaaaagcaatgttattaggtcacgatgctaaaggcacaactgttatgaaa 

25 gacgctatctgggaagctatcttcggtgaatctgcaccaagcagtgatccaaaagaatct 
agtggaacagaatcagaactaggcgcataa 

Sequence 2904 

MAEKNYRSFTGLTEFYYKVHGEGGVQKVADPERIKYLQEISVSKDQDIEKAYGDNQVAEM 
30 AVANGTIEVEAGFHKLPLEDRVALFGLEKSEDGIVSVGNDTPPYVAVMFAKTMEDGSREY 
VGLPKGLFTFPELEGNTKEDGVEFSSDSTTAEFMQAKVKGFEEEtCAMLLGHDAKGTTVMK 
DAIWEAIFGESAPSSDPKESSGTESELGA* 

Sequence 2905 
35 Con t ig_0 4 9 4_pos_l 1140_6422 

is similar to (with p-value l.Oe-33) 

>gp:gp! AB009866|AB00'9866_7 Bacteriophage phi PVL proviral D 
NA, complete sequence. NID: g3341907. 

atgaaggctatgggcgttcaacgtagtatttcggaaataaaacgtagctttaaagtjatta 

40 aacgctgacttaaaactatctaacaacaactttaagtattccgaaaaaagtttaaattca 
tataagttaagaactagagaattatcgcaagcagtcaaagaatctaaagctaacgttgca 
gcgttgaaagcaaaataccaagaagcagcaagagcatctggtgtgaatagcaaaaaagcc 
gctcaattaaggcaggaatatagtcgacaagctgacaatctcaactatttacaaaacgaa 
ctcgaccaaacacgtggcaaatacagagaaat gat tgcagtaagtaaa teat ctgtcggt 

45 agacttgggcaagcattttctgaaataggacctaagataagatccataggagattcaatg 
aagtcagtcgggcgtaacatgagtttacacgttactgcaccaattgcagcaggttttggc 
gctgcggtgaagaaaagtatagacttcgacgataccatgcgtaaagtaaaagccacatct 
ggtgctactggagatgaatttaaccagcttagaacaaaagcacttcaaatgggccgagat 
actaaattcacggcctctgaatctgctgaagcaatgaactacatggcgcttgctggttgg 

50 gacaccaaagatatgctaaaaggtgttggtggtgtaatggatttagctgctgcatctggt 
gaagatttagcaagcgtatctgatattgtaactgataacctaactgcatttggtatgaaa 
get aaagatagtacccactttgctgatgttttggctcaaacgagttcaaaagctaa tact 
gatgtacgtggtttaggtgatgcgtttaaatatgctgctccagttgctggtgcgttaggt 
tacacggtagaagatacatcaatagctattggtttgatgtctaatgctgggataaaaggt 

55 gaaaaagccggcacagcattaagaacaatgtttaccaacctatctaaaccaacaaaagca 
atgaaagacgaaatggataaactaggaatatctattactgatagcaacggtgaaatgtta 
cctatgagagatgttttagatcaacttagaggtaaaatgggcggtctatctaaagaccaa 
caagcagccgcagctagtacaatatttggtaaagaggccatgagtggtgcattagcagtt 
atcaacgoatcagacgaagattataaaaagctaactaaatccatagacggctcta.jaggt 



713 



wo 01/34809 



PCT/USOO/30782 



gcttcaaaaagaatggctaaagaaatggaaggcggtattggtggcgcaatgcgtaaaatg 
aaatcggcaattgaaagtttagcgatttcattaggtgatgcattagccccaatgttatat 
aaagctgctaaatggatcacatcattagcgaataagttttctaatttacctactggcgtt 
caaaaaacgattgcagttgtaggattactcgccgcagctattggtccactactaatggtc 
5 tttggcgttatggcatcaacaattggcactgctataacagtattaggctctttaatgacg 
agtatgagaacactatcatttttatctaaaaccagtgcagcagcgactggtatttggaat 
ggcgttactgccactgctcgtggtatcgcaaatggttatagatatgcggtggctgcatta 
accacttctcagacaatacaggctatgaaaactaaaattgctgcagctgcaacaacagct 
tggactacagttactaaaggtgcaactttagcaactaaaggcttgggattagcaataaga 

10 tttatgactggaccagtcggtatagttattacagccatcggattattagtagcagggctt 
att ca 1 1 tiatggaaaacaaatagctcgtttagaaatgcagt tat cggcatttggaac tea 
ataaaaaatgctgcaatagctatatttggttttatcaaaccttatattattaatatttgg 
aacgcaattaaaaactctacaattgccatttggaacgcgattaaaagaagtgctgtaata 
atatggaacgctattaaatttgctgttcaacatcctattcaagcattaaaaaatgtctta 

15 tcagctttatggaatggcatgaaaaatgctgctattaaaatctggaccgccttaaagaac 
ggtgttatagcaattattaaagcatatgttgcgcaagtaaaatttaatatcaaccttatt 
aagcgcatagtagttacgatatttaatgctattaagagtttctctattaaagtttggact 
gcattaaaaaatggtgtgttaggaatcgttcgagctttgcgcaaaggtgttctatctgta 
tttaacgcattaaaaaaaggtattgttgtaatatttaatgctgtaaagaatgccgcagtt 

20 aaaatctggacggctataaaaaaatcagtagtgaataaagcaaaagcattatggtctgga 
gttaaaaatacatggaatgcactcaaaaaaggtacaattggcatatttaaagcagttggc 
agtt teat gag ttctaaatggaacagtattaaaaaaggtactgttaataaagcgaaagct 
ctatggteaggcgtcaaaggtgcttggggatcacttaaaaaaggtactcataacaccatg 
aatgctgtaggtggcttcatgagcaagaaatggaatggaattaaaagtactactgtatct 

25 atagtaaatggcatgdaatcgaaagttatgggcaccatgaataaaatgagagacggtatc 
aaaacagttaccggtaaaattgggaatctttttggtggaatggttaaaggegttaaaaaa 
ggccttaatggattaatcaaaggtgttaactgggtcgcagataaattaggtatggataaa 
atacctaagattaaattatctacaggtactcaatctacgcatacacaaagttatattacc 
aaeggtaaaatcaataaaggtacaatggctactgtaggagatagaggtcgaggtaatgga 

30 cctggtggctt tagaaatgaaatgatcegttatcctaacggaagaatggcactaacacct 
aacac'^ayatacaacaacattct tacctaaaggctcaagtgtatatagtggcgcgc:aaca 
catgctattttatctaattcaggatatgaeactaagaagaaaaaactacctaaatttagt 
aaaggtactaaaaagaaagacggtatattagatgttattagctctggtgtaaaaaatgat 
gttaataaagtaaaagacattggtggtaaagcaagagacataggtggtactacgtttgac 

35 aaagcaaaagacataggtacaaaagcacttgataaagctaaagatgtgtctagcactgtt 
atcaagggtattggagatgtttttgattatgtaggtcatcctatgaaattggtaaataaa 
gtetttgagaaagttggttttaacctagactttatgaaaaatgcaccattaccatttgat 
ttaatgacagcta tgattaagaaacttaaaaatggtattaaagacttctttaatgaaggt 
ttagactctgcagqcggtggagatggttcttcgttcactaaattcccaattactacgggg 

40 tattatcctaatggtggtgctcetggttatagttttaatggtggtgctcactttggtatt 
gactatggcgctccatatggtacaactatcaatgctaccaatgatggaaatgtaaaagct 
atccacaacttaggtggaggacttgttgcacgacttttaacaggtcagttcacattgttc 
tttatgcacttatctaaaatattaaaacaaggtaaaatcaaagctggagaaccaatggct 
aaaacaggtaattcaggtcaatggactactggtccacacgtacacttccaagttgaaaga 

45 ggtcgccatgatgacatcacaaacagagggacagtaaaccctgctaaatggctcaaaggt 
cacggtggtggaaaagttggtggtagtggttctgtaaacgcacgtagagcaattcaaaga 
gcacaatctattttaggtggacgttataaatcgtcttatattaccgaacaaatgatgaga 
gttgccaaacgtgagtctaacttccaatcagatgcggttaataactgggacatcaacgca 
caaaaaggaacgccttctaaaggtatgttccaaatgattgaaccatcttttagagcatat 

50 gctaaacrraggacacggaaacatcttaaatccaactgacgaagctatatctgctat'jcgt 
tacartc'caggtaagtgggttcctattatggggagttggagaagtgcatttaaacotgct 
ggagattatgcttatgctacaggcggggttattaacactgctggattatataatttggca 
gaagatggataccctgagatagtaatccctacagatccaagcagacaatcagatgcgatg 
aaattgttacatcttgctgcaagtaaaattagtggaaataacagaaataaacgacctaac 

55 caattacgtacacctaatgttactagtaatacagttgataatgcagaattactactacaa 
atgatagaaaatcaacagaaacaaataaacgtgttaatggaaatagcacgaagtaataaa 
acta ttgaaaaacaaccgaaaggtttttcagaacgcgatgtaagtcaggcacaaggt tea 
aggttaagactcgctgcttatagccagggaggtttataa 
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Sequence 2906 

MKPMGVQRSISEIKRSFKGLNADLKLSNNNFKYSEKSLNSYKLRTRELSQAVKESKANVA 
ALKAKYOtlAARASGVNSKKAAQLRQEYSRQADNLNYLQNELDQTRGKYREMIAVSWSSVG 
RLGQAFSEIGPKIRSIGDSMKSVGRNMSLHVTAPIAAGFGAAVKKSIDFDDTMRKVKATS 
5 GATGDEFNQLRTKALQMGRDTKFTASESAEAMNYMALAGWDTKDMLKGVGGVMDLAT^SG 
EDLASVSDIVTDNLTAFGMKAKDSTHFADVLAQTSSKANTDVRGLGDAFKYAAPVAGALG 
YTVEDTSIAIGLMSNAGIKGEKAGTALRTMFTNLSKPTKAMKDEMDKLGISITDSNGEML 
PMRDVLDQLRGKMGGLSKDQQAAAASTIFGKEAMSGALAVINASDEDYKKLTKSIDGSKG 
ASKRMAKEMEGGIGGAMRKMKSAIESLAISLGDALAPMLYKAAKWITSLANKFSNLPTGV 

10 QKTIAWGLUUUVIGPLLMVFGVMASTIGTAITVLGSLMTSMRTLSFLSKTSAAATGIWN 
GVTATARGIANGYRYAVAALTTSQTIQAMKTKIAAAATTAWTTVTKGATLATKGLGLAIR 
FMTGPVGIVITAIGLLVAGLIHLWKTNSSFRNAVIGIWNSIKNAAIAIFGFIKPYIINIW 
NAIKNSTIAIWNAIKRSAVIIWNAIKFAVQHPIQALKNVLSALWNGMKNAAIKIWTALKN 
GVIAIIKAYVAQVKFNINLIKRIVVTIFNAIKSFSIKVWTALKNGVLGIVRALRKGVLSV 

15 FNALKKGIVVIFNAVKNAAVKIWTAIKKSVVNKAKALWSGVKNTWNALKKGTIGIFKAVG 
S FMSSKWNS I KKGTVNKAKALWSGVKGAWGSLKKGT HNTMNAVGG FMSKKWNG I KSTTVS 
IVNGMKSKVMGTMNKMRDGIKTVTGKIGNLFGGMVKGVKKGLNGLIKGVNWVADKLG^3DK 
IPKIKLSTGTQSTHTQSYITOGKINKGTMATVGDRGRGNGPGGFRNEMIRYPNGRMALTP 
NKDTTTFLPKGSSVYSGAQTHAILSNSGYDTKKKKLPKFSKGTKKKDGILDVISSGVKND 

20 VNKVKDIGGKARDIGGTTFDKAKDIGTKALDKAKDVSSTVIKGIGDVFDYVGHPMKLVNK 
VFEKVGFNLDFMKNAPLPFDLMTAMIKKLKNGIKDFFNEGLDSAGGGDGSSFTKFPITTG 
YYFNGGAPGYSFNGGAHFGIDYGAPYGTTINATNDGNVKAIHNLGGGLVARLLTGOFTLF 
FMHLSKILKQGKIKAGEPMAKTGNSGQWTTGPHVHFQVERGRHDDITNRGTVNPAKWLKG 
HGGGKVGGSGSVNARRAIQRAQSILGGRYKSSYITEQMMRVAKRESNFQSDAVNNWDINA 

25 QKGTPSKGMFQMIEPSFRAYAKPGHGNILNPTDEAISAMRYIVGKWVPIMGSWRSAFKRA 
GDYAYATGGVINTAGLYNLAEDGYPEIVIPTDPSRQSDAMKLLHLAASKISGNNRNKRPN 
QLRTPNVTSNTVDNAELLLQMIENQQKQINVLMEIARSNKTIEKQPKGFSERDVSQAQGS 
RLRLAAYSQGGL* 

30 Sequence 2907 

Cont ig_04 97_pos_4 8 4 6_34 91 

is similar to (with p-value 9.0e-50) 

>gp:gp I Y10528 I PACI0AB_1 P. aeruginosa cioA and cioB genes. N 
ID: g2208963. 

35 atggattcagtagaaatagctcgattgttgacgggtatgacacttgcagtgcatatcata 
tttgcaacaattggtgttggtatgccacttatgtttgcaatagcagagtttattggcata 
aagaaaaatgatgcgaattatatcacattggctaaaagatggtcgaaaggctacacgata 
accgttgctgttggagtcgttacaggtactattat tggtttacaactttcacttgtctgg 
ccaacatttatgaaaatgggcggtcatgttatcgcattaccactattcatggaaactttt 

40 gcattcttttttgaagcaatctttttaagtatttatttatacacttgggaaagatttaaa 
aataaatggacacattttttcatatctatacccgttattataggaggttcattctcagca 
ttctttai.tacatcagtcaattcatttatgaataccccagctggttttgaaattaaaaat 
ggtcgtatggtaaatgttcagccattagaagcaatgtttaattcatcgtttatggttcgt 
gctttacatgtagttgcaactgcaggtatgacgatggcgtttatattagcagccatcgca 

45 gcgtttaaattattacgtcataatcatacagaagatagaacataccatacaaaagctctt 
aatttaagcatgattgttggattcatcaataccgttctttcgatgattgcaggagattta 
tctgctaaatttttacacaaagttcaaccagataagcttgcagcatatgaatggcattat 
gatacgcaatctcatgcgaatttagttcttttcggtgtattaaatgaaaaaacacatgaa 
gtttcaggagcactagagattcccggattacttagttttttagcagataatagctttaat 

50 acaaaagttaaaggtttaaatgaatttcctaaaaatgaattacctcctatgatagtgcat 
tactttttcgatttgatggtatcaatgggaatattctgttttatcatttcaggactatat 
atgttatttttaattgttaaaaagttacgtaaatatgtaactagtaatatgatgttatac 
gctatcttgttaactggtcctgcatcaatgttagcgatagaatttggttggttcttaact 
gaaatgggtcgtcaaccgtggattatacgaggatatatgcgcgtatcagaagctgctacc 

55 caggcggggggaatcacattagtaacgacgctattcggtttactatatcttttattacta 
gttacttcggcttatgtactaattagaatgtttaaaaatcaacctgcttataaagatgta 
gaaaaagtgattaaagagagaggtgaaacaaaatga 

Sequence 2908 
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MDSVEIARLLTGMTLAVHIIFATIGVGMPLMFAIAEFIGIKKNDANYITLAKRWSKGYTI 
TVAVGWTGTIIGLQLSLVWPTFMKMGGHVIALPLFMETFAFFFEAIFLSIYLYTWERFK 
NKWTHFFISIPVIIGGSFSAFFITSVNSFMNTPAGFEIKNGRMVNVQPLEAMFNSSFMVR 
ALHVVATAGMTMAFILAAITU^FKLLRHNHTEDRTYHTKALNLSMIVGFINTVLSMIAGDL 
5 SAKFLHKVQPDKLAAYEWHYDTQSHANLVLFGVLNEKTHEVSGALEIPGLLSFLADNSFN 
TKVKGLNEFPKNELPPMIVHYFFDLMVSMGIFCFIISGLYMLFLIVKKLRKYVTSNMMLY 
AILLTGPASMLAIEFGWFLTEMGRQPWIIRGYMRVSEAATQAGGITLVTTLFGLLYLLLL 
VTSAYVLIRMFKNQPAYKDVEKVIKERGETK* 

10 Sequence 2909 

Cont ig_04 99_pos_7 8 60_85 97 

is similar to (with p-value 4.0e-71) 
>sp:sp|Q4 6807 |ARCL_ECOLI CARBAMATE KINASE-LIKE PROTEIN 1. > 

gp ;gp 1^:^8375 1 ECU28375 24 Escherichia coli K-12 genome; appro 
15 ximately 64 to 65 minutes. NID: g8B7800. >gp : gp ( AE000370 I AEO 

00370_9 Escherichia coli K-12 MG1655 section 260 of 400 of t 

he complete genome. NID: g2367170. 

gtgaatcatttgaatttatttttgggaaaaaagcacattcttaaagatattacattctca 
ttacctataaatggtgaaat cat tggtattgtaggacctaatggtgccggtaaa teat ct 

20 cttctcaaagcatttatcggagaatttaaagcgaccggtaaacatacattatatgatcac 
cctatacatacacaacttagatatattacatatatcccacaaaaagctcatattgattta 
gattttcctataaaagtggatcaagtgatactatcaggatgttatgaagacattggatgg 
tttaaaaaagcaagtgtagttgaaaaaacgaaacttaaccaattacttaatgatt tagaa 
ctcgaccatatacgccatcgacaaattgcagaattaagtggtggacaattgcaacgtgtt 

25 cttgttgcaagagcacttatgagtaatagtgatatttattgtttagacgaaccttttgtt 
ggcatcgacatttatagcgaacaacttattatgaagaaaattaaacatttaagacatatg 
ggtaaattgattttaattgttcatcatgacttatcaaaagcagatcaatattttgaccgc 
attttattattaaatcaatcgttgcagtttttagggccaacaaaagaagcattatcatct 
gaacgattaaatgcaactttcattaattacaaagatgattcgcttttaacactatcctca 

30 caagggagtacgaattag 

Sequence 2910 

VNHLNLFLGKKHILKDITFSLPINGEIIGIVGPNGAGKSSLLKAFIGEFKATGKHTLYDH 
PIHTQLRYITYIPQKAHIDLDFPIKVDQVILSGCYEDIGWFKKASWEKTKLNQLi.NDLE 
35 LDHIRHRQIAELSGGQLQRVLVARALMSNSDIYCLDEPFVGIDIYSEQLIMKKIKHLRHM 
GKLILIVHHDLSKADQYFDRILLLNQSLQFLGPTKEALSSERLNATFINYKDDSLLTLSS 
QGSTN* 

Sequence 2911 
40 Contig_04 99_pos_8598_9434 

>gp:gp|X99127 |SEABCTS_1 S . epidermidis gene encoding ABC tra 
nsport system. NID: gl617427. 

atgttagatttcattaaccatttgcttagttatcaatttttaaatcgtgcattaatcaca 
tctattttagttgggattgtatgtggaacgatgggtagcattattgttttacgtggtctt 

45 tctttaatgggtgatgccatgagtcatgctgttttaccaggtgttgctttatctttctta 
tttaatattccaatgtttatcggggcacttgtaacgggaatgcttgcaagtttgtttatt 
ggttttattacttcaaacagtaaaacaaaaccagatgctgcaataggaataagtttcact 
gcattcctagcatctggcgtcataattattagtttaatcaatagtacaacagatttatat 
cacattttatttggcaatttattagcaattacacatcaatcattttggacaacaattgtc 

50 attactgtactggttattttacttattattatcttttatagacctttaatgatttcaaca 
tttgatgcaacttttagtcgtatgagcgggctgaacacaacattaattcacta'ctttgtc 
atgttattactcgcacttgtaactgttgcgagcatacaaacagttggaattatccttgta 
gt*.gr:tttactaatcactccagcttctacagcttttttaatcagtaaacaacttt.iwgcc 
atgatggtaattgcaagcataatcagcgtgataagttcgattatcggtctatattLtagt 

55 tatatatataatattccaagtggagcaactattgtaatctgtacctttatgatttatatt 
gtaacgctatcaattactagaattaaaaataaacaaaaaaggagcgctttaacgtga 

Sequence 2912 

MLDFINHLLSYQFLNRALITSILVGIVCGTMGSIIVLRGLSLMGDAMSHAVLPGVALSFL 
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FNIPMFIGALVTGMLASLFIGFITSNSKTKPDAAIGISFTAFLASGVIIISLINSTTDLY 
HILFGNLLAITHQSFWTTIVITVLVILLIIIFYRPLMISTFDATFSRMSGLNTTLIHYFV 
MLLLALVTVASIQTVGIILVVALLITPASTAFLISKQLYAMMVIASIISVISSIIGLYFS 
YIYNIPSGATIVICTFMIYIVTLSITRIKNKQKRSALT* 

5 

Sequence 2913 

Cont igO 4 99_pos_0_8 64 

>gp:gp|X99127 |SEABCTS_2 S . epidermidis gene encoding ABC tra 
nsport system. NID: gl617427. 

10 gtgagtgaaatggctaaaattgtagt age tttaggtggaaacgctttaggaaaat caeca 
caagaacaacttgaattagtaaaaaatacagctaaatccctagtaggattaattactaaa 
ggtcacgaaattgtgattagtcacggtaatggaccacaagtaggaagtattaaccttggt 
ctgaattatgcagctgaacacgatcaaggtcctgcttttccatttgctgaatgtggcgct 
atgagtcaagcctacatcggctatcaacttcaagaaagtttacaaaatgaacttcattca 

15 atgggcatagataagcaagttgtcacactagttacccaagtagaagttgatgaaggcgat 
ccagcttttaatagtccaagtaaacccatcggtctgttctacactaaagaagaagcaaat 
cgtattcaacaggaaaaaggttatcaatttgtagaagatgctggtcgaggttaccgtcgc 
gttgtaccatcaccacaaccaatatctattatcgaactggaaagtattaaaactctagta 
gaaaatgacacactcgtcatcgctgcaggtggaggtggtataccagtcattcgcgaacag 

20 catgatagctttaaaggtatagatgccgtcatcgataaagacaaaacaagtgcattatta 
ggtgctgatattcactgtgatcaactcattattttaacagcgattgattatgtttatatc 
aactatcatactgaccaacaacaagcacttaaaacaacaaatatagatacgcttaaaaca 
tatattgaagaagaacaatttgccaaaggcagcatgctacctaaaatcgaatctgccatc 
tcctttattgaaaataatcctaac 

25 

Sequence 2914 

VSEMAKIVVALGGNALGKSPQEQLELVKNTAKSLVGLITKGHEIVISHGNGPQVGSINLG 
LNY7VAEHDQGPAFPFAECGAMSQAYIGYQLQESLQNELHSMGIDKQWTLVTQVEVDEGD 
PAFNSPSKPIGLFYTKEEANRIQQEKGYQFVEDAGRGYRRVVPSPQPISIIELESIKTLV 
30 ENDTLVIAAGGGGIPVIREQHDSFKGIDAVIDKDKTSALLGADIHCDQLIILTAIDYVYI 
NYHTDQQQALKTTNIDTLKTYIEEEQFAKGSMLPKIESAISFIENNPN 

Sequence 2915 
Contig_0500_pos_3850_3113 

35 is similar to (with p-value 2,0e-97) 

>sp:splP36839}ARGD_BACSU ACETYLORNITHINE /AMINOTRANSFERASE ( 
EC 2.6.1.11) (ACOAT) . >pir : pir | S384 31 1 S384 31 N~acetylornithi 
ne aminotransferase - Bacillus subtilis >gp : gp | Z2 691 9 | BSCITB 
0_4 B. subtilis (168) DNA for argC-F citrulline biosynthetic 

40 operon. NID: g408113. >gp : gp I Z99109 j BSUB0006_199 Bacillus su 
btilis complete genome (section 6 of 21) : from 999501 to 120 
9940. NID: g2633260. >gp: gpl Z99110 I BSUB0007_4 Bacillus subti 
lis complete genome (section 7 of 21): from 1194391 to 14111 
40. NID: g2633472. 

45 atgaaaaacatcatcgtaattaagctcggtggtatagctatagaaaatttaaacgacgca 
tttatacaacaaattaatgcttggcaccttgaaaacaaaaaaataattattgttcacggt 
ggcggccaagtcatcagtaatttattaactaaaaacaatcattcaactattaaaattgat 
ggcatgagagtaactgctaaaaatgatttacctatcatatatgatgctttaattaacata 
gttggccatcaacttttagaaagacttaaagaatctaatttagaattttttcaatttaaa 

50 gaaaagataaaagaacttgtaagcgccgaatttttaaataaaaatatctatggttacgtt 
ggtaaagttaaagaaatcaacacgatgctattagaaaaaatgttatcacgcgacataata 
ccaattatcactagtttgggtgtaaatgagcaaggggagtatcttaatgttaatgctgat 
catctcgccaccgcaattgctaagaaactaaaagtcgaaaaattagtatatatgactgat 
gtccctggtgtaattgaaaaagataaaacgcttgctactcttacaattaatgaagcaaaa 

55 acaaaaattgaaaataaaataattacaggcggaatgatacctaaaattgagagtgcaatc 
caaacattagaatctggtgttgaatcgattttaattgcaaataatttacaaaaaggaaca 
atcataaggggtgattaa 

Sequence 2916 
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MKNIIVIKLGGIAIENLNDAFIQQINAWHLENKKIIIVHGGGQVISNLLTKNNHSTIKID 
GMRVTAKNDLPIIYDALINIVGHQLLERLKESNLEFFQFKEKIKELVSAEFLNKNIYGYV 

GKVKEI NTMLLEKMLSRDI I PI ITSLGVNEQGEYLNVNADHLATAI AKKLKVEKLVYMTD 
VPGVIEKDKTLATLTINEAKTKIENKIITGGMIPKIESAIQTLESGVESILIANNLQKGT 
5 IIRGD* 

Sequence 2917 

Cont ig_0 500_pos_3 1 1 0_1 983 

is similar to (with p-value l.Oe-34) 

10 >sp:sp| P36840I ARGB_BACSU ACETYLGLUTAMATE KINASE (EC 2,7.2.8 
) (NAG KINASE) (AGK) ( N-ACETYLGLUTAMATE 5-PHOSPHOTRANSFERASE 
). >pir:pir|S38430|S38430 N-acetyiglutamate 5~phosphotransf e 
rase - Bacillus subtilis >gp:gp| Z26919 |BSCITB0_3 B.subtilis 
(168) DNA for argC-F citrulline biosynthetic operon. NID: g4 

15 08113. >gp:gpl Z99109|BSUB0006_198 Bacillus subtilis complete 
genome (section 6 of 21): from 999501 to 1209940. NID: g263 
3260. >gp:gp| Z99110|BSUB0007_3 Bacillus subtilis complete ge 
nome (section 7 of 21): from 1194391 to 1411140. NID: g26334 
72. 

20 atgagttatctttttaataattacaagcgtgacaatatagagtttgttgatgctaatcaa 
aatgaattaattgataaagataataatgtctacctagatttttcgtcaggtataggtgtg 
acaaatctgggttttaatatggaaatttaccaagcagtttataatcaactgaatttaata 
tggcattcacccaatttatacctaagtagtatccaagaggaagtggctcaaaaattaatt 
ggtcaacgagattatttagctttcttttgtaatagcggaacagaagcgaatgaggcagct 

25 atcaaactcgcacgtaaagctactggtaagtcggaaattattgcttttaaaaaqtctttt 
cacggcagaacgtacggcgcaatgtctgcaacaggacagaagaaaattacagatcaattt 
ggtccggttgttcctggattcaaatttgctatttttaatgattttaattcatttaaatca 
ttaacttcaaataatactgctgctgtaattatagaaataattcaaggtgaatcaggagta 
ctacctgctgattctttatttatgaagcaattaaatgagtattgtaaacaaaaagatatc 

30 cttataattgtagacgaggttcaaacgggcataggtagaaccggtaagttatatgctcat 
gaacattatcaattgtctccagatatcatcacattagctaaaggattaggtaatggcctt , 
cctattggagcaatgttaggcaaaaaaaatttaggtcatgcatttggctacggttctcat 
ggtacaacattcggtggaaatagattatcattggctgctgcaaaccaaacgctttctatc 
attaatgatgctgatttgctgaatgatgttcaatcaaaggggcaatttcttattgaaaac 

35 ttaagaaaaagtttagtaaataaaagaaatgtaattgaagtacgtggtgtaggtttaatg 
gtaggaatagaggtcactaatgatcctagtcaagtagtgcgagaagctaaacgtatgggg 
ttaatcattttaacagctggtaaaaatgtgattaggttattaccgccattgaccatcact 
aaaaaacaattagaaaaaggtatagaaatattaactgaaatcatttga 

40 Sequence 2 918 

MSYLFNNYKRDNIEFVDANQNELIDKDNNVYLDFSSGIGVTNLGFNMEIYQAVYNQLNLI 
WHSPNLYLSSIQEEVAQKLIGQRDYLAFFCNSGTEANEAAIKLARKATGKSEIIAFKKSF 
HGRTYGAMSATGQKKITDQFGPWPGFKFAIFNDFNSFKSLTSNNTAAVIIEIIQGESGV 
LPADSLFMKQLNEYCKQKDILIIVDEVQTGIGRTGKLYAHEHYQLSPDIITLAKGLGNGL 

45 PIGAMLGKKNLGHAFGYGSHGTTFGGNRLSLAAANQTLSIINDADLLNDVQSKGQFLIEN * 
LRKSLVNKRNVIEVRGVGLMVGIEVTNDPSQVVREAKRMGLIILTAGKNVIRLLPPLTIT 
KKQLEKGIEILTEII* 

Sequence 2919 

50 Contig_0502__pos_13693_14094 

is similar to (with p-value 7.0e-44) 

>sp:sp| P22806I BIOF_BACSH 8-AMINO-7-OXONONANOATE SYNTHASE (E 
C 2.3.1.47) (7-KETO-8-AMINO- PELARGONIC ACID SYNTHETASE) (7- 
KAP SYNTHETASE) (L-ALANINE--PIMELYL COA LIGASE) . >pir:pir|JQ 

55 0512IJQ0512 8-amino-7-oxononanoate synthase (EC 2.3.1.47) - 
Bacil.lu.-s' sphaericus >gp:gp I M29291 1 BACBI0XWF_3 B.sphaerinus b 
ioXWF operon genes, complete cds. NID: gl42592. 
gtgcacctgattgtgctggcgtacctggttccactggtttacctggttctgctggcgtac 
ctggttccgctggtttacctggttccgctggtttacctggttccgctggtttacctggtt 
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ccgctggtttacctggttctgctggtttacctggttccgctggtttacctggttctgctg 
gcgtacctggttccgctggtttacctggttccgctggtttacctggttctgctggtttac 
ctggttctgctggcgtacctggttccgctggtttacctggttctgctggcgtacctggtt 
ccgctggtttacctggttccgctggtttacctggttccgctggtttacctggttctgctt 
5 ttgttggaccatactcaacaatttcgtcaacaggttgtttag 

Sequence 2920 

VHI.IVLAYLVPLVYLVLLAYLVPLVYLVPLVYLVPLVYLVPLVYLVLLVYLVPLVYLVLL 
AYLV?LVYLVPLVYLVLLVYLVLLAYLVPLVYLVLLAYLVPLVYLVPLVYLVPLV!kLVLL 
10 LLDHTQQFRQQW* 

Sequence 2921 

Con t i g_0 5 0 2_po s_l 5 6 1 9_1 3529 

is similar to (with p-value 3.0e-21) 
15 >sp:sp|P22818 IBIOD_BACSH DETHIOBIOTIN SYNTHETASE (EC 6.3.3. 

3) (DETHIOBIOTIN SYNTHASE) (DTB SYNTHETASE) (DTBS) . >pir:pir 

I JQ0506I JQ0506 dethiobiotin synthase {EC 6.3.3.3) - Bacillus 
sphaericus >gp:gp IM29292 I BACBI0DAYB_1 Bacillus sphaericus I 

F03525 bioDAYB operon encoding dethiobiotin synthase (bioD) , 
20 adenosylmethionine-8-amino-7~oxononanoate aminotransferase 

(bioA) , biotin synthase (bioY) and bioB genes, complete cds. 
NID: gl42587. 

gtggatgagatcgttcattatggtggcgaagaaatcaagccaggccataaggatgaattt 
gatccaaacgcaccgaaaggtagccaagaggacgttccaggtaaaccaggagttaaaaat 

25 cctgatacaggcgaagtagtcacaccaccagtggatgatgtgacaaaatatggtccagtt 
gatggagattcgattacgtcaacggaagaaattccattcgacaagaaacgtgaattcaat 
cctgatttaaaaccaggtgaagagcgtgttaaacagaaaggtgaaccaggaacaaaaaca 
att;ar.aa<.:accaacaactaagaacccattaacaggggaaaaagttggcgaaggtg'acca 
acagaaaaaataacaaaacaaccagtagatgaaatcacagaatatggtggcgaagaaatc 

30 aagccaggccataaagatgaatttgatccaaatgcaccgaaaggtagccaagaggacgtt 
ccaggtaagccaggagttaaaaatcctgatacaggcgaagtagtcacaccaccagtggat 
gatgtgacaaaatatggtccagttgatggagatccgattacgtcaacggaagaaattccg 
tttgataaaaaacgcgaatttaatccaaacttagcgccaggtacagagaaagtcgttcaa 
aaaggtgaaccaggaacaaaaacaattacaacaccaacaactaagaacccattaacaggg 

35 gaaaaagttggtgaaggtgaaccaacagaaaaaataacaaaacaaccagtggatgagatc 
gttcattatggtggcgaagaaatcaagccaggccataaggatgaatttgatccaaacgca 
ccgaaaggtagccaagaggacgttccaggtaaaccaggagttaaaaatcctgatacaggc 
gaagtagtcacaccaccagtggatgatgtgacaaaatatggtccagttgatggagattcg 
attacgtcaacggaagaaattccgtttgataaaaaacgcgaatttgatccaaacttagcg 

40 ccaggtacagagaaagtcgttcaaaaaggtgaaccaggaacaaaaacaattacaacgcca 
acaactaagaacccattaacaggagaaaaagttggcgaaggtgaaccaacagaaaaaata 
acaaaacaaccagtggatgagattgttcattatggtggtgaacaaataccacaaggtcat 
aaagatgaatttgatccaaatgcacctgtagatagtaaaactgaagttccaggtaaacca 
ggagttaaaaatcctgatacaggtgaagttgttaccccaccagtggatgatgtgacaaaa 

45 tatggtccgaaagttggtaatccaatcacatcaacggaagagattccatttgataagaaa 
cgtgtatttaatcctgatttaaaaccaggtgaagagcgcgttaaacaaaaaggtgaacca 
ggr'acaaaaacaattacaacaccaatattagttaatcctattacaggagaaaaagttggc 
gaaggtaaatcaacagaaaaagtcactaaacaacctgttgacgaaattgttgagtr.tggt 
ccaacaaaagcagaaccaggtaaaccagcggaaccaggtaaaccagcggaaccaggtaaa 

50 ccagcggaaccaggtacgccagcagaaccaggtaaaccagcggaaccaggtacgccagca 
gaaccaggtaaaccagcagaaccaggtaaaccagcggaaccaggtaaaccagcggaacca 
ggtacgccagcagaaccaggtaaaccagcggaaccaggtaaaccagcagaaccaggtaaa 
ccagcggaaccaggtaaaccagcggaaccaggtaaaccagcggaaccaggtaaaccagcg 
gaaccaggtacgccagcagaaccaggtaaaccagtggaaccaggtacgccagcacaatca 

55 ggtgcaccagaacaaccaaatagatcaatgcattcaacagataataaaaatcaattacct 
gatacaggtgaaaatcgtcaagctaatgagggaactttagtcggatctctattagcaatt 
gtcggatcattgttcatatttggtcgtcgtaaaaaaggtaatgaaaaataa 

Sequence 2922 



719 



wo 01/34809 



PCT/USOO/30782 



VDEIVHYGGEEIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPDTGEVVTPPVDDVTKYGPV 
DGDSITSTEEIPFDKKREFNPDLKPGEERVKQKGEPGTKTITTPTTKNPLTGEKVGEGEP 

TEKITKQPVDEITEYGGEEIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPDTGEVVTPPVD 
DVTKYGPVDGDPITSTEEIPFDKKREFNPNLAPGTEKVVQKGEPGTKTITTPTTKNPLTG 
5 EKVGEGEPTEKITKQPVDEIVHYGGEEIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPDTG 
EVVTPPVDDVTKYGPVDGDSITSTEEIPFDKKREFDPNLAPGTEKWQKGEPGTKTITTP 
TTKNPLTGEKVGEGEPTEKITKQPVDEIVHYGGEQIPQGHKDEFDPNAPVDSKTEVPGKP 
GVKNPDTGEVVTPPVDDVTKYGPKVGNPITSTEEIPFDKKRVFNPDLKPGEERVKQKGEP 
GTKTITTPILVNPITGEKVGEGKSTEKVTKQPVDEIVEYGPTKAEPGKPAEPGKPAEPGK 
10 PAEPGTPAEPGKPAEPGTPAEPGKPAEPGKPAEPGKPAEPGTPAEPGKPAEPGKPAEPGK 
PAEPGKPAEPGKPAEPGKPAEPGTPAEPGKPVEPGTPAQSGAPEQPNRSMHSTDNKNQLP 
DTGENRQANEGTLVGSLLAIVGSLFIFGRRKKGNEK* 

Sequence 2923 
1 5 Con t ig_0502_pos_l 27 8 1_1 2 1 1 0 

is similar to (with p-value 2.0e-34) 

>gp: gp I U677 63 I PBU677 63_1 Plasmodium berghei throrabospondin 
related adhesion protein (PbTRAP) gene, complete cds . NID: g 
1906578. 

20 atgaata.catttgtaactggaacgaatactgacatcggtaaaacgtatgtcactaaatat 
ctctataaggcattaagaacgaggggatatcgggtatgtattttcaaaccttttcaaact 
gaagaaattggtggaggtagatacccagatttagaaatttataaaaacgaatgcgattta 
gactatgacgttacgtctctttacacattcaaagatccagtttcaccacatttagcattc 
aaaattgaaaggcatcagcaattgaaccatcaaacaatgatagacaaactcgaatcacta 

25 gaagcacaattcgatatgattctcattgaaggtgcaggtggtattgcagtgcctatctat 
gaatacagtgaccatttttatatgacaacagatttaattaaagacacatcggatttcatt 
gtgagtgtcttaccttcaaagttaggtgcgattaatgatgccatcgttcaccagaaatat 
attgatcatcaagaacttcccccgaatgtgttaatcatgaataactatactgatagtgct 
attgaacaggataatttacataccattgaaaaattaatacacaagtctgtttatacgttg 

30 ggtcatcaagcgactcaagaaagcttttccgaagcatttatacaacgaattataggagga 
tccaatggctaa 

Sequence 2924 

MNIFVTGTNTDIGKTYVTKYLYKALRTRGYRVCIFKPFQTEEIGGGRYPDLEIYKNECDL 
35 DYDVTSLYTFKDPVSPHLAFKIERHQQLNHQTMIDKLESLEAQFDMILIEGAGGIAVPIY 
EYSDHFYMTTDLIKDTSDFIVSVLPSKLGAINDAIVHQKYIDHQELPPNVLIMNNYTDSA 
lEQDNLHTIEKLIHKSVYTLGHQATQESFSEAFIQRIIGGSNG* 

Sequxsnce 2925 
40 Contig_0502_pos_10751_9618 

is similar to (with p-value 8.0e-36) 

>gpr gp I U67763 I PBU67763_1 Plasmodium berghei thrombospondin 
related adhesion protein (PbTRAP) gene, complete cds. NID: g 
1906578. 

45 atggacattaaagcacagttaaaacagattcaagataaaggtttatatagagagcttcag 
ccgattcagtccgtagaaaaacaatatatttatatcaatgaccaatcttatattaatttt 
acttcgaacgattatctcggtataggacaagttgaatatcaacctcaaaatttcttagat 
tttataaagacatatagtatccatctatcaagttctagattagtgagtggcaattcagtt 
gtttatcagcaattagaacaggaaattagcgagcattttaattttgaagacgccttaatt 

50 tttaatagtggttacgatgcgaatttggcggtatttaatatttttaaaaataataatatt 
gttattttttctgatcaacagaatcatgccagtataatagacggtattaaattaagtggt 
ttatcaaaagtgatttatcaacatttaaactatgatgacctggaaagtcatttagcacgg 
cacaccaatccagatgttcaaaaagtaattgtctctgatagtgtgttttctactaatggc 
actaaagcagatattaataggctagtacatctcaagcaacgttacaatgcgattttaatt 

55 attgacgcatctcatagtttaggattaaatctctttgagtatcatgcagacattgacata 
gttacttcaagtttatctaaagcgtggggagcccatggtggcgtcatattcagttcaaaa 
gat ataaaagatttaa teat taataaaggtcgttcgcttatctactcgagtagtttacct 
agctatcatttgtattttattcaagtgagcttacaacatgtgattgaagatacatacaga 
cgagcjgaagttgaatgtacttagtgaatattttaatcaccaattcatggaattatttccc 
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gatcaaccattatctaatacacctatcaaaaatatcgtttgtgatagtttggcttcagca 
caagcacaatacgacatgttatttgaacatggtatatttgtcagttatttaaggtatcca 
acagtgtcacagctaacattaagaatt teat tatcctattt teat gacacagatgatatt 
gatcgacttttcaatgtaatgaaacaatacgatgaaggtgatagctatgtatag 

5 

Sequence 2926 

MDIKAQLKQIQDKGLYRELQPIQSVEKQYIYINDQSYINFTSNDYLGIGQVEYQPQNFLD 
FIKTYSIHLSSSRLVSGNSVVYQQLEQEISEHFNFEDALIFNSGYDANLAVFNIFKNNNI 
VIFSDQQNHASIIDGIKLSGLSKVIYQHLNYDDLESHLARHTNPDVQKVIVSDSVFSTNG 
10 TKADINRLVHLKQRYNAILIIDASHSLGLNLFEYHADIDIVTSSLSKAWGAHGGVIFSSK 
DIKDLIIMKGRSLIYSSSLPSYHLYFIQVSLQHVIEDTYRREKLNVLSEYFNHQFMELFP 
DQPLSNTPIKNIVCDSLASAQAQYDMLFEHGIFVSYLRYPTVSQLTLRISLSYFHDTDDI 
DRLFNVMKQYDEGDSYV* 

15 Sequence 2927 

Cont ig_050 9_pos_6 4 2 1_7 818 

>sp:sp|P13408 iUHPT_ECOLI HEXOSE PHOSPHATE TRANSPORT PROTEIN 
. >pir :pir I A30395 IMMECHP hexose phosphate transport protein 
uhpT - Escherichia coli >gp: gp I M17102 I EC0UHP_5 E.coli uhp op 

20 eron encoding UhpA, UhpB, UhpC, and UhpT protein, (encoding 
hexose phosphate transport protein) ^ complete cds, and an il 
vBN operon encoded protein, 3' end. NID: gl48110. >gp:gp|M89 
4 79 1 EC0UHPABCT_4 Escherichia coli uhpABCT operon encoding he 
xosephosphate utilization protein (uhpA) gene, complete cds, 

25 and hexosephosphate transport protein (uhpB, uhpC, uhpT) ge 
nes, complete cds. NID: gl48116. >gp:gp|AE000444 !AE000444_5 
Escherichia coli K-12 MG1655 section 334 of 4 00 of the compl 
ete genome. NID: g2367258. 

atgataggggtgttagggatgaaettttttgatattcataaaatgccaaacaaagggata 

30 ccattagctgtacaaegcaaat tatggctcagaaactttatgcaagcgttttttgtcgta 
ttctttqlittacatggcgatgtatttaattcgaaacaattt taaagcggcacaacv.gtta 
ttaaaagaagaaatcggattaacaacattagaactaggttatataggattagcgtttagt 
attacttacggtttaggaaaaacaatactcggttatttcgttgatgggcgtaatacgaaa 
cgtattatttccttcttattaatattatctgcgattacagtacttattatgggatttgta 

35 ttaagttatttcggttctgtgatggggctattaattgtattgtgggggcttaacggtata 
ttteaatctgtgggtgggcctgcaagttactcaacgatttcaaggtgggcgeetcgaaca 
•aagcgcggtegttatttaggcttttggaatacatcacataacattggtggtgetattgct 
ggtggtgtcgcactttggggcgcgaatacatttttccacggtaatgtggttggaatgttt 
atttttccttccgtcatcgctttaatcattgggattgtgacattatttattggtaaagat 

40 gatccagaggaattaggttggaatcgtgccgaagaaatttgggaagagcctatcgaecaa 
gaaaacattgattctcaaggtatgactaaatgggatatctttaaaaaatatatccttgga 
aatcctgtgatttggattttgtgtatctctaatgtttttgtatatatcgtgcgtattggt 
attgataactgggcaccgctatacgtatcagagcatttacattttaataaaggtgatgcg 
gtgaatactattttttactttgaaataggtgcattagtagctagtttattgtggggctat 

45 atetcagatttattaaaaggtcgtcgtgcgattgtagegattggatgtatgtttatgatc 
acctttgttgtactcttttataccaatgcaacaagcgtgacaatggteaatatttcteta 
tttgcattaggcgctttaatetteggteeacagttacteattggtgtatetctgactggc 
tttgttcctaaaaatgcaattagtgtcgctaacggtatgacaggttcatttgcatatcta 
ttcggggattcaatggctaaagtgggtctggctgcaatcgctgatccaacacgtaatggt 

50 ttriao.tatttttgggtatacgttgagtggttggacagatgtctttattgtattct^.tgta 
gctttat ccttaggaatgatattattagccattgttgcttattacgaagaaaagaaaatt 
agaaaattaaaaatttaa 

Sequence 2928 

55 MIGVLGMNFFDIHKMPNKGIPLAVQRKLWLRNFMQAFFWFFVYMAMYLIRNNFKAAQPL 
LKEEIGLTTLELGYIGLAFSITYGLGKTILGYFVDGRNTKRIISFLLILSAITVLIMGFV 
LSYFGSVMGLLIVLWGLNGIFQSVGGPASYSTISRWAPRTKRGRYLGFWNTSHNIGGAIA 
GGVALWGANTFFHGNWGMFIFPSVIALIIGIVTLFIGKDDPEELGWNRAEEIWEEPIDQ 
ENIDSQGMTKWDIFKKYILGNPVIWILCISNVFVYIVRIGIDNWAPLYVSEHLHFNKGDA 
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VNTIFYFEIGALVASLLWGYISDLLKGRRAIVAIGCMFMITFWLFYTNATSVTMVNISL 
FALGALirGPQLLIGVSLTGFVPKNAISVANGMTGSFAYLFGDSMAKVGLAAIADF'TRNG 
LNIFGYTl,SGWTDVFIVFYVALFLGMILLAIVAYYEEKKIRKLKI* 

5 Sequence 2929 

Contig_0517_pos_10178_9693 

is similar to (with p-value 2.0e-26) 

>sp:spl P55978|GREA_HELPY TRANSCRIPTION ELONGATION FACTOR GR 
EA (TRANSCRIPT CLEAVAGE FACTOR GREA) . >gp : gp I AE000596 | HPAEOO 
10 0596__20 Helicobacter pylori section 74 of 134 of the complet 
e genome. NID: g2313982. 

atgagatttatggaaaaccaaaaacaatatcctatgactcaagaaggttatgagaaactt 
gaacaagaattagaagaattaaaaacggttaaaagacctgaggtagttgaaaaaataaaa 
gtagctcgttcatttggagacctatctgagaactctgaatatgatgctgctaaagatgaa 

15 caaggctttattgaacaagatatacaacgtattgaacatatgattagaaatgcgttaatc 
attgaagataacggtgataacaatgtagttcaaattggtaaaacagttacttttattgaa 
ttacctggagatgaagaagaaagttatcaaatcgttggttctgctgaagctgacgcattt 
aaaggaaaaatttctaacgaatctccaatggcaaaagcactaatcggtaaaggattaaat 
gatcaagtacgtgttccacttcctaacggtggcgaaatgaatgttaaaatcgttgaaatt 

20 aaataa 

Sequence 2930 

MRFMENQKQYPMTQEGYEKLEQELEELKTVKRPEVVEKIKVARSFGDLSENSEYDAAKDE 
QGFIEQDIQRIEHMIRNALIIEDNGDNNWQIGKTVTFIELPGDEEESYQIVGSAEADAF 
25 KGKISNESPMAKALIGKGLNDQVRVPLPNGGEMNVKIVEIK* 

Sequence 2931 

Cent ig_0517_pos_9512_0 913 

is similar to (with p-value l.Oe-57) 

30 >sp:sp|P24 24 7|PFS_ECOLI PFS PROTEIN (P46). >pir : pir | S4 5227 | 
S45227 purine nucleoside phosphorylase homolog - Escherichia 
coli >gp:gp| D26562 I EC082K_47 Escherichia coli genome, 2.4-4 
.1 min region (110,917-193,643 bp from 0 min) . NID: g473770. 
>gp:gp|U70214 |ECU70214_10 Escherichia coli chromosome minut 

35 es 4-6. NID: gl552727. >gp:gp| AE000125 I AE000125_6 Escherichi 
a coli K-12 MG1655 section 15 of 400 of the complete genome. 

NID: gl786348. >gp : gp ( U24 4 38 | ECD24 4 38_1 Escherichia coli MT 
A/SAH nucleosidase gene, complete cds . NID: g2981266. 
atgcgctatgcgtatggttcagaaaaggttttaccgttatctaaaaaaacatctcatcat 

40 gctagaatatatattggtcagccaaccaaaatattcaacacgaggagatgcttttttatg 
tcatctgacacaaacagtttagcacatacaaaatggaattgtaagtatcacatagt:;ttt 
gtaccar^aatatagaagacaagtgatatacggaaaaatcaaaagagatattggagt tatt 
ttacgtcaactatgtgaaagaaaaggcgtagaaataatagaagcagaagcatctaaagat 
catattcatatgttagttagtattccacctaaattaggagtatcctcatttgttggatat 

45 ttaaaaggtaaaagcagtttaatgatttttgatagacatgctaatttaaaatatagatat 
ggaaatagaaagttttggtgtaaaggattttatgtagatacagtaggtagaaataaaaag 
gtaattgaaaattacattcgtaatcaattacaagaagatatcgttgcagatcaaatttca 
atggaagaatacctagatccctttacaggagaagaaattaaaaaaagacgaaaaaaatag 

50 Sequence 2932 

MRYAYGSEKVLPLSKKTSHHARIYIGQPTKIFNTRRCFFMSSDTNSLAHTKWNCKYHIVF 
VPKYRRQVIYGKIKRDIGVILRQLCERKGVEIIEAEASKDHIHMLVSIPPKLGVSSFVGY 
LKGKSSLMIFDRHANLKYRYGNRKFWCKGFYVDTVGRNKKVIENYIRNQLQEDIVADQIS 
MEEYLDPFTGEEIKKRRKK* 

55 

Sequence 2933 
Contig_0517_pos_4 627_4 100 
is similar to (with p-value l.Oe-41) 
>gp:gp| AF006000I AF006000_5 Bordeteila pertussis D-3-phospho 
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glycerate dehydrogenase homolog (serA) and Brgl (brgl) genes 
, complete cds. NID: g2290988. 

atggatcttactccaaatgaaatttataacttagttatatatcaattaggtgcgttaagt 
ggcttttgtaaaatcaatcatgtaaaaatgatgcatgttaaacctcatggtgccctttat 
5 caaatgggggctagaaataaagaaattgcacatgcaattgctcaagcagtttttgatttc 
gactcaaatctaattttcgtcggcttagcgaatacattacttatttcggaagctgaatta 
gtggggcttaaggtagcttcggaagtatttgctgaccgtcgttatgaagatgacggacaa 
ttggtaagtagaaaaaaaaccgatgccactatcactaatactgacgaagcaatccaacaa 
gcattaaaaatggttttggaaaataaagttgtaagtaaaaatggaaaaatcatcgatttg 
10 aaagctgatacaatttgtgttcacggagatggaaaacacgcattagaatttgttacgcaa 
attagaaatgaattaatgaaagaaggcattgatattcaatccttatag 

Sequence 2934 

MDLTPNEIYNLVIYQLGALSGFCKINHVKMMHVKPHGALYQMGARNKEIAHAIAQ?.VFDF 
15 DSNLIFVGLANTLLISEAELVGLKVASEVFADRRYEDDGQLVSRKKTDATITNTDEAIQQ 
ALKMVLENKVVSKNGKIIDLKADTICVHGDGKHALEFVTQIRNELMKEGIDIQSL* 

Sequence 2935 

Con t i g_0 5 1 7_po s_2 6 4 5_1 9 5 9 

20 is similar to (with p-value 2.0e-52) 

>gp:gp|AF025380|AF025380_l Salmonella typhimurium IS200 tra 
nsposase (tnpA) gene, complete cds. NID: g2555163. >gp:gp|YO 
9990ISTFLGLIS2_2 S. typhi flgL gene, gene encoding putative I 
S200 transposase and gene encoding putative RNAseE-like prot 

25 ein. NID: g2765044. >gp: gp | Y09991 I STIS2T157_1 S. typhi encodi 
ng putative IS200 transposase, 1575bp. NID: g2765048. >gp:gp 
jZ54217|STISFLIBC_3 S . typhimurium fli[B,C] genes and inserti 
on sequence IS200. NID: gll50641. >gp: gp | U4 474 9 | STU4 474 9_1 S 
almoneila typhimurium putative IS200 transposase gene, compl 

30 ete cds. NID: gll77216. >gp: gp I AF093749 1 AF09374 9_2 Saimonell 
a typhimurium strain LT2 NADP+-linked malic enzyme (maeB) , p 
artial cds; insertion element IS200 transposase, complete cd 
s; ei.'.t operon, complete sequence; and unknown genes. NID: g3 
885908. >gp:gp|L25848 |STYIS200A_1 Salmonella typhimuriu?.* IS2 

35 00 insertion sequence from SARA17, partial. NID: g4 39618. 

atgataggaattattggagcaatggaagaagaagtgacgattttaaagcgtaaattgaat 
gatatgaatgaaataaatattgcgcatgttaaattttatgttggcaagctaaaccacaaa 

/ gaggtggttttaacacaaagtggtataggtaaagttaatgcttctatctcaacgactttg 
ttaatagaaaaatttaatccagaagtcgtcattaatactggatcagcaggtgcactagat 

40 caaacactatctattggagatatattagtgagtaatcatgtattatatcatgatgctaat 
gctacagcgtttggttatgaatatggacaaatacctcaaatgcctaaaacttatactact 
gatcctactttgttgaaaaaaacaatgcatgtattagaacaacaacaactgaatggtaaa 
gtaggtatgattgttagtggtgatagttttataggtagctcagaacagcgacaaaaaatt 
aagcaacaatttccagaagctatggctgtcgaaatggaggcaactgcaattgcgcaaaca 

45 tgttatcaatttaaagtaccatttatcgtaactagagctgtttctgatttagcaaacggt 
aaagccgatatttcttttgaagaatttttagataaagcagctttatcatctagtgagaca 
gtttcattattagtagaatcattataa 

Sequence 2936 

50 MIGIIGAMEEEVTILKRKLNDMNEINIAHVKFYVGKLNHKEWLTQSGIGECVNASISTTL 
LIEKFNPEWINTGSAGALDQTLSIGDILVSNHVLYHDANATAFGYEYGQIPQMPKTYTT 
DPTLLKKTMHVLEQQQLNGKVGMIVSGDSFIGSSEQRQKIKQQFPEAMAVEMEATAIAQT 
CYQFKVPriVTRAVSDLANGKADISFEEFLDKAALSSSETVSLLVESL* 

55 Sequence 2937 

Contig_0518_pos_1926_964 

is similar to (with p-value l.Oe-49) 

>sp:sp|P45510|DHAK_CITFR DIHYDROXYACETONE KINASE (EC 2.7.1. 
29} (GLYCERONE KINASE). >gp : gp I U09771 | CFU09771_2 Citrobacter 
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freundii DSM 30040 cyclopropane fatty acid synthase (cfa) g 
ene, partial cds, dihydroxyacetone kinase (dhaK) , glycerol d 
ehydrogenase (dhaD) , transcriptional activator (dhaR) , l,3~p 
ropanediol dehydrogenase (dhaT), glycerol dehydratase (dhaB) 
5 , glycerol dehydratase (dhaC) and glycerol dehydratase ^dhaE 
) genesr complete cds . NID: gl229153. 

atgaaaaagttaattcaagataaaaacacaattttaaaagatatgcttgatggaattaca 
gtttcaaacaacgatgttgaagttgtatctgacactattgttgttagaaagcataaaaaa 
caatcaggtgttgcactcgtttctgggggcggcagtggacatgaacctgcacacgcagga 

10 tttgtagcagaaggcatgctcgatgcagctgtatgtggagaaatcttcacttcacctaca 
cctgataaaatattagatgccattaaagctgtggacaatggtgacggcgttctacttgtt 
attaaaaactatgcaggagacgttatgaactttgaaatggctcaagaaatggctcaaatg 
gaagatattaaagttgaaagtgttattgtcagagatgatattgctatttctgacccggaa 
aaacgccgtggtgtagcgggtacagtatttgtgcataaatatgctggatacctagctgaa 

15 aaaggtgttgcacttgatgaaatcaaatctaaagttgaggcacttttaccagatattaaa 
agtattggtatggcattaacgcctccaatggtgcccacaactggtaaaaacggtttcgat 
attgaagacaatcaaatggaaattggtatcggtattcacggtgaaaaaggtttacatcgt 
gaagacgtacaacctattaacgtgattgttgaacgtttactcgatcaattatacaaagaa 
attgagaaaaaacctttaatcgtaatggttaatggtatgggtggtacgccactatcagaa 

20 ttaaatatagttactaaatatctagatgaacaattcaatcagaatgatattggtgttaaa 
caatggttcgtaggtgactatatgacagcgttagacatgcaaggcttctctataactgta 
ctccccttcagtgaagaattgagtgaagctttagctgcacctacagcaagtaaatatttc 
taa 

25 Sequence 2938 

MKKLIQDKNTILKDMLDGITVSNNDVEWSDTIVVRKHKKQSGVALVSGGGSGHEPAHAG 
FVAEGMLDAAVCGEIFTSPTPDKILDAIEOWDNGDGVLLVIECNYAGDVMNFEMAQEMAQM 
EDIKVESVIVRDDIAISDPEKRRGVAGTVFVHKYAGYLAEKGVALDEIKSKVEALLPDIK 
SIGMALTPPMVPTTGKNGFDIEDNQMEIGIGIHGEKGLHREDVQPINVIVERLLDQLYKE 

30 lEKKPLIVMVNGMGGTPLSELNIVTKYLDEQFNQNDIGVKQWFVGDYMTALDMQGFSITV 
LPFSEELSEALAAPTASKYF* 

Sequence 2939 
Contig_0519_pos_4 309_3827 
35 is similar to (with p-value 2.0e-25) 

>gp:gp|AF008183|AF008183_l Populus balsamifera subsp. trich 
ocarpa X Populus deltoides 4-coumarate :CoA ligase 2 (4CL2) m 
RNA; complete cds , NID: g2911796. 

atgaataaagtgctatttgagcaatatcctcattttgttaaaaatcacattcatattcct 
40 gatggcttatcagaaaatctagaagcagaagcggaacgatataacaatttattagatgaa 
agagggccaatcgatattcaaattttaggaattggagaaaatggtcacattggttttaat 
gaaccagggactgacttcaatagtgaaacacatgtggtgaacttaacagaaagcaccata 
aaagcaaatagtcgattttttgacaatgaaaaggatgttcctagacaagcagtttcaatg 
ggggtaaaaagtattttaaaagcaaaaaggattatcctactcgcatttggtccaaagaaa 
45 aaagaggctataagtaaactgttaaatgaacaggttaccgaagatgtacctgcgaccatt 
ttacacacacaccctaatgttgaagtttatgtagacgatgaagcagcgccagattgttta 
taa 

Sequence 2940 

50 MNKVLFEQYPHFVKNHIHIPDGLSENLEAEAERYNNLLDERGPIDIQILGIGENGHIGFN 
EPGTDFNSETHVVNLTESTIKANSRFFDNEKDVPRQAVSMGVKSILKAKRIILLAFGPKK 
KEAISKLLNEQVTEDVPATILHTHPNVEVYVDDEAAPDCL* 

Sequence 2941 
55 Cont ig_05 1 9_pos_l 58 6_2 8 8 

is similar to (with p-value 2.0e-22) 
>sp:sp 1004802 I NAGl^CANAL GLUCOSAMINE- 6- PHOSPHATE ISOMERASE 
(EC 5.3.1.10) (GLUCOSAMINE-6- PHOSPHATE DEAMINASE). >pir:pir 
IA4 6652I A4 6652 glucosamine-6-phosphate isomerase (EC 5.3.1.1 
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0) - yeast {Candida albicans) >gp:gp IL07558 1 YSANAG1A_1 Candi 
da albicans glucosamine- 6-phosphate deaminase (NAGl) mRNA, c 
omplete cds . NID: gl70885. 

gtgtttgaagatcgacatctaacgtatggagaattaagtaaagaaatttatcaggctagt 
5 atgcgctataaagaagtaaaattaaacaaaaaagtaggtctaatggatgaacatcctgta 
aataatattattaactattttgcggtacatcaaagaggtggaattccttgcatttttaat 
catcaatggagtaatgaaaggatacatcaacttgtaaaaagttatgacatacaatggtta 
attaaagataatcatcttacctcaaatcatgataactcaatttataatgatgaggttatc 
ccacgtaatgttatacatataggtttcacgtcaggaactacaggtttacccaaagcgttt 

10 tatagaaatgaacattcttggatagtttcttttaaggaaaatgagaaattactccagcat 
tgtgaagaaaccattgtagcaccgggt act ttatcacattcactt teat tgtacgcatgt 
atttatgcattaagtactggaaaaacatttataggtcaaaaaaattttaatccactatct 
cttatgcgtcttattaatcaattgaacaaaacgacagcaatatttgtagtgccaacgatg 
gtacaacaacttatttcaactcaacgacattgttcatcgattaaaagtattttgagtagt 

15 ggtgctaaacttacattgcaacagtttcaacaaatcagaaatttatatccacaagcaaat 
tt.*-Btaq?^attttttgggacatctgaagcaagttttataagctacaattttaacc.'.atca 
tctcutc^ctaattctgttggtaaacttttccctcatgtcgagacacgattattaa'C.tcaa 
gatgatgatgcagtaggattattagccgttagaagtgaaatggtgtttagtggttatgtt 
ggacaaagcaatcaagagggggcatggattaaaacaggcgacttcgcttatattaaaaat 

20 caacatttgtttttagtaggtagagagagtgatcgtattatagttggggggattaatgta 
tatccaacagctattgaaagcttaattatggatattgaaggcattgatgaggcccttgtc 
attggtataccacatgctaaatttggagaaatagcgatattgctttattcaggtaaagta 
caattgaattaccgacaaattaaatcttttttaatgaaacatctttcaagacaagaagtt 
ccatcaaaattaaagaaaattgaccatatgatttatacagaatcaggaaagattgctaga 

25 aaagagatgaaaaataaatttattaatggagagttataa 

Sequence 2942 

VFEDRHLTYGELSKEIYQASMRYKEVKLNKKVGLMDEHPVNNIINYFAVHQRGGIPCIFN 
HQWSNERIHQLVKSYDIQWLIKDNHLTSNHDNSIYNDEVIPRNVIHIGFTSGTTGLPKAF 
30 YRNEHSWIVSFKENEKLLQHCEETIVAPGPLSHSLSLYACIYALSTGKTFIGQKNFNPLS 

LMRLINQLNKTTAIFVVPTMVQQLISTQRHCSSIKSILSSGAKLTLQQFQQIRNLYPQAN 
LIEFFGTSEASFISYNFNQSSPANSVGKLFPHVETRLLNQDDDAVGLLAVRSEMVFSGYV 
GQSNQEGAWIKTGDFAYIKNQHLFLVGRESDRIIVGGINVYPTAIESLIMDIEGIDEALV 
IGIPHAKFGEIAILLYSGKVQLNYRQIKSFLMKHLSRQEVPSKLKKIDHMIYTESGKIAR 
35 KEMKNKFINGEL* 

Sequence 2943 

Con t i g_0 5 2 3_po s_0_3 0 3 

is similar to (with p-value l.Oe-31) 
40 >sp:sp| P14 638|TRPB_METVO TRYPTOPHAN SYNTHASE BETA CHAIN (EC 
4.2.1.20). >gp:gp|M35130 |MV0TRPBA_2 M.voitae tryptophan syn 

thase operon (trp) genes, complete cds. NID: gl50070. 

atgaaaattcaaacagaagtagatgaattgggctttttcggtgaatatggtggccaatat 

gtacctgaaacattgatgccagctattattgaacttaaaaaagcatatgaggacgcgaaa 
45 tcagatactcacttcaagaaagaatttaattattatttaagtgaatatgttggtagagaa 

acgcctttaacatttgctgaatcatacacaaaattgttaggtggtgccaaaatatatctt 

aaaagagaagacttaaatcacactggtgctcataaaattaataacgcgataggacaggcA 

TAA 

50 Sequence 294 4 

MKIQTEVDELGFFGEYGGQYVPETLMPAIIELKKAYEDAKSDTHFKKEFNYYLSEYVGRE 
TPLTFAESYTKLLGGAKIYLKREDLNHTGAHKINNAIGQA+ 

Sequence 2945 
55 Contig 0527_pos__178_1065 

>sp:spi P22 983)PODK_CLOSY PYRUVATE, PHOSPHATE DIKINASE {TC 2. 
7.9.1) ( PYRUVATE, ORTHOPHOSPHATE DIKINASE) . 

atgggagcaacagcatttgtctataacggtcgtttccaccctgaaacatatctcgagtta 
cttcaaaattatcaaattaatgttctatgttgtacaccaacagaatatcgtatgatggct 
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aaacttagtcatttagaacagtacaatttagagtatttacacagtgcggtgtctgcgggt 
gaacctttaaatcgagaagttgttgaacaatttaaacgtcattttaatattactgttcga 
gatggatatggacaaaccgaaagtacattgttgatcggatttctaaaagatactgaacca 
cgtatgggttctatgggcaaaggtatacctggtagttttgttactgtcattgacgatgat 
5 ggtaaagaggttggtccaaatgttaaaggtaatatcgccgtgcctttagacttaccggct 
ttatttaaaggttactttaaagatgaagcacgcacaaaagcagcttcaacaggtgattat 
tat gt tact ggagaccaagctcatattgataatgatggttatttctggttcgaaggtcgc 
cgtgacgatattatcattagttcaggatataccattggacctttcgaggtagaagacgca 
ctaacaaatcacgcagctgttaaagaatgtgcagttgttgcaagtcctcatgacattcgt 
10 ggaaatattgttaaagcatttatcatcttgcaagatgattatgaagcaagtgatgagtta 
atccaagaattacaagtattttgtaaaaatgaagtagcaccgtataaatatccaagagca 
attgaatttgttgaacatctaccaaaaacaaattcaggtaagatacgtcgtgttgaatta 
cgtgacgcagaaataaaaaaatataaacaacaagattcatcacattaa 

15 Sequence 294 6 

MGATAFVYNGRFHPETYLELLQNYQINVLCCTPTEYRMMAKLSHLEQYNLEYLHSAVSAG 
EPLNREVVEQFKRHFNITVRDGYGQTESTLLIGFLKDTEPRMGSMGKGIPGSFVTVIDDD 
GKEVGPNVKGNIAVPLDLPALFKGYFKDEARTKAASTGDYYVTGDQAHIDNDGYFWFEGR 
RDDII ISSGYT IGPFEVEDALTNHAAVKECAWAS PHDIRGNIVKAFI I LQDDYEASDEL 

20 IQELQVFCKNEVAPYKYPRAIEFVEHLPKTNSGKIRRVELRDAEIKKYKQQDSSH* 

Sequence 2947 
Contig_0527_pos_14 55_0 
is similar to {with p-value 3.0e-67) 
25 >gp:gp|AF068246|AF068246_l Mus musculus SA protein mRNA, co 
mplete cds . NID: g3928675. 

atqaaagacttacttggtggtaaaggtgccaatctttcagagatgaagagactcggacta 
ccagr.ac'jagatggttttacaattacgactgaagcttgtattacatatttaaaac*aaat 
gaagaactacctacagaagtaaagacacaattaattgatcatttagcagctttttctaaa 

30 cgaacaggaaaagccttttcctctgatgataacttgttattagtatcagtacgtagtggt 
gcaaaaatttccatgccaggaatgatggatacaattcttaatttaggacttaatgatgat 
aatgtaaaaaagcttgttgacaaaacaaatgatgcacgatttgcatatgattgt taccgt 
cgtttactacaaatgtttggagaggttgtttatggtattccaatgacagctttcgataca 
tattttaatgattttaaaacaaagcatcgttatcaaaatgatgctgaaattccggcagaa 

35 ggactacaaactatatgtgaaaaatataaagaaatctatgtagaagaggcatataaacct 
tttccccaagaaccgttaaagcaattagaagaagcaattgaagcagtatttaaatcttgg 
gataatgatcgtgcacgtgtatatagagatttaaatgatattccacatgatattggtaca 
gccgtaaatattcaggaaatggtatttggtaatagtggtgaaaatagtggtacgggtgta 
gcatttacaagaaatccagttactggagaaaatcatttattcggagagtacttacttaat 

40 gctcaaggtgaagatgttgttgcaggcattcgtacccctaaggatattgacactttaaaa 
caacaaatgccagatgtacatcaagagtt tgttgatgtaaccaaacaacttgaaaaacat 
tacaaagatatgcaagacatagaatttacaattgaaaatggtaaactttatttattacaa 
acacgtaacggaaaacgtactgctaaagctgcaataaaaattgctgtggatttagttcac 
gagcaattaattacacgtgaagaagcagtatcaaaagtagaggtaaaatcaatagaccaa 

45 ttattacatcctaattttaatgaagaatcattaaagcaagcgacagtggtttctaaaatg 
ggcttaccagctagtcccggtgcagcaacaggaaaagtagtcttctctgctgaagaagca 
aaacttcaagctgaaaatggtaataaagtagtgttaatgcgaccggaaacatcacr'^gaa 
gatattgaggggatggtagcaagtgaagcaatcgtaacaactcatggtggtatga .. atca 
cacgctgctgttgtagcaagaggaatgggcaaatgttgtgtgacaggatgttcgaatgta 

50 gagatagatacagtgaacaaaacagtatattatcctgaaggtgaattacatgaaggggat 
atcgtttctgtagatggttcagctggtgatttatatttaggagcaattgaaacagtcaat 
gctgaacatagtgaagagttcgatcaatttatgacttggtctgaagagattgcaagactg 
caagttagaatgaatgctgaaacaccacaagatataaaagctggatataattttggttct 
aaaggaataggtttagttcgtactgagcacatgttctttggccctgaacgtttaatagaa 

55 atgcgccgttttatcttagcttcaaatcatgacgaacgtgtacaagctttagaaaaaatt 
aaaacataccaagtagaagattttgaaacaattttcagattatctcaagatagacctacg 
attgttcgtttacttgatccaccgttacatgagttcttaccatcatctgaagaagatata 
aacaatgtttctcaacagctgaatgtatcttcagagttcttacgcaagcgaatcgttgac 
ttaaatgaggtcaattcaatgcttggtcatcgtggttgtcgtttggctgtgacttatcca 
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gagttatatgagatgcaagttgaagctatcattgaaagtgttattaagcttcaaaaagag 
ggcataacgtg.cctaccagaaattatgattcctctcgtgtcaacagtagaagaatttaca 
actttaaaagaacgattagttaatacaattacacatttagaaaaagaatcacaacaagat 
atacaatatatgataggtactatgattgaaacgcctagagcatgcttgattgcgaatgac 
5 cttgcgaaacattgtgatttcttcagttttggtactaatgatttaacgcaattgacattt 
ggtttctctagagatgatgcaggaaaattcataaatgtgtatactgaaaataacatttta 
cagcttgacccattccaaactttagatagagaaggtgtaggacgactaattcaattagct 
gttgaacaagctaaaaatacaaatccagagataaaaattggtgtattta 

10 Sequence 294 8 

MKDLLGGKGANLSEMKRLGLPVPDGFTITTEACITYLKQNEELPTEVKTQLIDHLAAFSK 
RTGKAFSSDDNLLLVSVRSGAKISMPGMMDTILNLGLNDDNVKKLVDKTNDARFAYDCYR 
RLLQMFGEVVYGIPMTAFDTYFNDFKTKHRYQNDAEIPAEGLQTICEKYKEIYVEEAYKP 
FPQEPLKQLEEAIEAVFKSWDNDRARVYRDLNDIPHDIGTAVNIQEMVFGNSGENSGTGV 

15 AFTRNPVTGENHLFGEYLLNAQGEDVVAGIRTPKDIDTLKQQMPDVHQEFVDVTKQLEKH 
YKDMQDIEFTIENGKLYLLQTRNGKRTAKAAIKIAVDLVHEQLITREEAVSKVEVKSIDQ 
LLHPNFNEESLKQATVVSKMGLPASPGAATGKVVFSAEEAKLQAENGNKVVLMRPETSPE 
DIEGMVASEAIVTTHGGMTSHAAWARGMGKCCVTGCSNVEIDTVNKTVYYPEGELHEGD 
IV5VDGSAGDLYLGAIETVNAEHSEEFDQFMTWSEEIARLQVRMNAETPQDIKAGYNFGS 

20 KGIG.OVtri'EHMFFGPERLIEMRRFILASNHDERVQALEKIKTYQVEDFETIFRLS::DRPT 
IVRLLDPPLHEFLPSSEEDINNVSQQLNVSSEFLRKRIVDLNEVNSMLGHRGCRLAVTYP 
ELYEMQVEAIIESVIKLQKEGITCLPEIMIPLVSTVEEFTT1.KERLVNTITHLEKESQQD 
IQYMIGTMIETPRACLIANDLAKHCDFFSFGTNDLTQLTFGFSRDDAGKFINVYTENNIL 
QLDPFQTLDREGVGRLIQLAVEQAKNTNPEIKIGVFX 

25 

Sequence 2949 
Contig_0528_pos_3612_3271 
No hits found 

atgtcaaatagcgcttcaatatctcagagcaacgtcgcaagtcaaagtactacagcaagt 
30 ttgagccaatctgaatcagcaaatgattcaatgagttcatctctgtccgagtctaactca 
ataacatccgaaagtaatacaaatagcaaatcggaaattgaatcaaaaagtacgtctaca 
agcgagttcttgtcagaatcaggaagtgtatctaactcagaaaaatctgagtcaatttct 
cattctcaatcaacatcagctacaccttcttctcaatcgacttaccaacaacaacctaaa 
gaagagaagaaaggtttctttgcacgtctatttaacttataa 

35 

Sequence 2950 

MSNSASISQSNVASQSTTASLSQSESANDSMSSSLSESNSITSESNTNSKSEIESKSTST 
SEFLSESGSVSNSEKSESISHSQSTSATPSSQSTYQQQPKEEKKGFFARLFNL* 



40 Sequence 2951 

Cont ig_0530_pos_2 1 7 1_1 1 9 1 

is similar to (with p-value 3.0e-28) 

>gp:gp|Y14325|ATY14325_l Arabidopsis thaliana mRNA for meva 
lonate diphosphate decarboxylase. NID: g2288886. >gp:gp|AC00 

45 54 99tATAC0054 99__10 Arabidopsis thaliana chromosome II BAG T6 
A23 genomic sequence, complete sequence. NID: g3785992. >gp: 
gp|Y17593|ATH17593_l Arabidopsis thaliana MVDl gene, axons 1 

to 9. NID: g3250735. 
gtgaaaagtggcaaagcacgagcacatacaaatattgcgttgattaagtattgggggaaa 

50 gctgatgaaacttacattattcctatgaataatagtttatcagttaccttagatagattt 
tatactgaaacaaaagtgacatttgaccctgattttactgaagattgccttattttaaat 
ggtaatgaagtgaatgccaaagagaaagaaaagattcaaaactatatgaatatagtgaga 
gatttggctggaaatcgtttgcatgcgcgaattgaaagtgaaaattatgtgccaacagca 
gcaggacttgcttcttcagcgagtgcttacgctgctttagctgccgcttgtaatgaagct 

55 ttgtcattgaacttatcagatacagacttatcacgattagctcgacgtggttcaggttct 
gcttctagaagtatttttggtggatttgccgaatgggaaaaagggcatgatgatttaact 
tcatatgcacatggtattaattccaatggttgggaaaaagatttatcaatgatatttgta 
gtcattaacaatcagtcaaaaaaagtatctagtaggtcaggaatgtcactaacaag.igat 
acute: tagattttatcaatattggttggatcacgttgatgaagatttaaatgaag/.aaaa 
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gaggcagtcaaaaatcaagattttcaacgcttaggagaagtcattgaagcaaatggttta 
cgtatgcatgccactaacttaggcgctcaacctcctttcacgtatttagtgcaagaaagc 
tacgatgctatggcgattgtggaacagtgtcgaaaagccaatttaccttgttactttaca 
atggacgcgggtcccaatgtaaaagttttagtagaaaagaaaaataaacaagctgtgatg 
5 gaacaatttttaaaagtatttgacgaatcgaagattatagcaagtgatatcattagctct 
ggtgttgaaattattaagtaa 

Sequence 2952 

VKSGKARAHTNIALIKYWGKADETYIIPMNNSLSVTLDRFYTETKVTFDPDFTEDCLILN 

10 GNEVNAKEKEKIQNYMNIVRDLAGNRLHARIESENYVPTAAGLASSASAYAALAAACNEA 
LSLNLSOTDLSRLARRGSGSASRSIFGGFAEWEKGHDDLTSYAHGINSNGWEKDLSVilFV 
VINNQSKKVSSRSGMSLTRDTSRFYQYWLDHVDEDLNEAKEAVKNQDFQRLGEVIEANGL 
RMHATNLGAQPPFTYLVQESYDAMAIVEQCRKANLPCYFTMDAGPNVKVLVEKKNKQAVM 
EQFLKVFDESKI I ASDI I SSGVEI IK* 

15 

Sequence 2953 
Contig_0532_pos_1713_3254 

>sp:splP54 715|PTIB_BACSU PTS SYSTEM, ARBUTIN-LIKE IIBC COMP 
ONENT (PHOSPHOTRANSFERASE ENZYME II, BC COMPONENT) (EC 2.7.1 
20 .69). >gp:gp| Z99108 |BSUB0005_89 Bacillus subtilis complete g 
enome (section 5 of 21): from 802821 to 1011250. NID: g26330 
55. >gp:gp| D50543|D50543_3 Bacillus subtilis DNA for 76-degr 
ee region, complete cds. NID: gl486240. 

atgtttgcttttttcggtattgttttgggattcgctacattatttaaaaatccaaccatt 

25 atgggaggattagctgatcagcaaacattttggtttaaattttggtctgttattgaatca 
ggtggttgggtaatatttacacatatggaaattgtctttgtagttggcttaccattatct 
cttgctaaaaaagcaccaggacatgcagctttagcagctctaatgggatatttaatgttt 
aatacttttatcaatgcaattttaactcaatggccacatacttttggcgctaatttaaaa 
aaaggtgtagaaaacacaacaggattaaaatcgattgcaggtattgaaacgttagatacc 

30 aatattttaggtgcaatcattatctcaggaataataacgtggatacataatagatattac 
agtaagcgtttacctgaaatgttaggtgtatttcaaggattaacattcgttgtaacaatc 
tctttctttgtcatgctaccagtggcagcaatcacttgtgttgtttggccaacgattcaa 
cacggtattgcttcaattcaatattttattgttgcatcaggttatataggtgtttggtta 
tatcatttcttagagcgtgtactaatacctacaggattgcatcattttatctatgcacca 

35 atcgaagtaggtccagtagttgttaatcatggtttgaaagcagaatggcttcaacactta 
aaccagtttgccgaaagtaataaaccacttaaagaacaattcccatatggatttatgttg 
caaggaaatggaaaggtttttggttgtctaggtatagcattggcaatgtatgcaactaca 
ccaaaagaaaatcgtaaaaaagttgctgcattattaataccagcaacacttacggcagta 
gtagctggtattacagaaccacttgaatttacattcttatttattgcgccatttttattc 

40 gtattacatgcactactagcagcaactatggatacactgatgtatggatttggtgttgta 
ggtaatatgggtggcggtgtactagattttattgcaactaactggataccattaggaaaa 
gcacattggatgacatatgtctttcaagtagtaattggtttaatctttgttgcaatttac 
tacttcttatttaaatatttaattttaaaatttgatattccattaccaggacgtaagaaa 
ggtgaagaagaagttaaattattttccaaacaagattataaagataaaaaaggagattca 

45 actcgcaatcattcacctaatagtgaatatgaagaaaaggcgatgtactacctagaaggt 
cttggaggaaaagaaaatattaaagacgttacaaattgtacgacacgtctacgtttaact 
gttaaagacgaaagtaaagttcaagaaagtgcctattttacacataatcaaatgtctcat 
ggtttagtaaaaagtggcaaaagtgttcaagtcgtcgttggaatgtctgtacctcaggta 
agagaagcatttgaaaatattgtaaatgatgatctatcttag 

50 

Sequence 2954 

MFAFFGIVLGFATLFKNPTIMGGLADQQTFWFKFWSVIESGGWVIFTHMEIVFVVGLPLS 
LAKKAPGHAALAALMGYLMFNTFINAILTQWPHTFGANLKKGVENTTGLKSIAGIETLDT 
NX LGAI IISGIITWI HNR Y YS KRL PEMLG VFQGLT FWT I S FFVMLPVAAI TCWWPT I Q 
55 HGIASIQYFIVASGYIGVWLYHFLERVLIPTGLHHFIYAPIEVGPVWNHGLKAEWLQHL 
NQFAESNKPLKEQFPYGFMLQGNGKVFGCLGIALAMYATTPKENRKKVAALLIPATLTAV 
VAGITEPLEFTFLFIAPFLFVLHALLAATMDTLMYGFGWGNMGGGVLDFIATNWIPLGK 
AHWMTYVFQVVIGLIFVAIYYFLFKYLILKFDIPLPGRKKGEEEVKLFSKQDYKDKKGDS 
TRNHSPNSEYEEKAMYYLEGLGGKENIKDVTNCTTRLRLTVKDESKVQESAYFTHNQMSH 
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GLVKSGKSVQVVVGMSVPQVREAFENIVNDDLS* 

Sequenca 2955 
Contig_0533_pos_2932_4371 
5 is similar to (with p-value 7.0e-20) 

>gp:gp|AB005556|AB005556_l Plectoneraa boryanum DNA for NADP 
Hrprotochlorophyllide oxidoreductase, hypothetical protein, 
partial and complete cds . NID: g3123724 . 

atggaggtgtttaacatgacaaatcaattatttattaacaatgaatttatagaaagtcag 

10 tctaaagagacaatggatgtcattaatccagctactggcgaggcatttgatactatcact 
cttgcaactgaagaggaagtaaacgacgccattgaaaaatcgcaacaagcacaacttgaa 
tgggagcgcgtgcctcaacctacacgtgcggaacatgttaaattacttatacctttatta 
gaaaaaaatcgcgatgaaatagctcaattatacgtaaaagaacaaggtaaaactttagcg 
caagcttatggagaaattgacaaatcaatctcatttatcgattatatgacaagtctgagt 

15 atgtcagataaaggacgtgttctacaaaatagtattgcaaatgaaacgattcaaattatc 
aacaaacctatcggagttactgctggtattgtgccatggaacgcaccgatacttgtcctt 
atgcgaaaagtcattccagctat'agtaactggttgttcagtagtgattaaacctagtgaa 
gagacaacgttactcactcttcgattagctgaattattcagagcatcaactataccagca 
ggattgtttcaaattgttcctggcactggagaaacagtaggtacacaattagcttcgcat 

20 aaagacattcaacttatttctttaactggaagtatgagagctggtaaatctgtttacgaa 
aatgctgctcaaactgtaaagaaagtaaatttagaattaggtggaaacgcaccagtcatt 
gtcacatcaaatgccgatttagataaagcagttgactatatcgtgacagcccgtataaat 
aatgcaggtcaagtttgtacgtgccctgaacgtatctttgtacatgaagatgttcacgat 
gactttttaaataaagtaacttccaaaatgaaaagcttaactgttggagatccatttgat 

25 gaaaacaccgattacggcgcaattattaaccaaaaacaacttgatagtattcatgaaaag 
gttcaagatgctattaaaaatggtgcaacattgatgactggtggacatcaattaaaacgc 
catggtttcttctacgcaccaacagtattagataacgttagaaaagacgataatgtgttt 
aaagatgaaatttttggtccagttcttgcgataacaacctaccgtgatattgaacaagtg 
attgaagacgctaatgatacaaacgctggcttatcttcttatatcttctctgaaaattta 

30 acagaagtaatgacagcaaccgaacgtctaaaatttggtgaagtatatgcaaattgtgag 
gctgaagaagtggttaatggctatcacgcaggttggcgtgaatcaggcttaggtggcgct 
gacggtattcacggttttgaagaatactacaataccacagtaagttatattagatactaa 

Sequence 2956 

35 MEVFNMTNQLFINNEFIESQSKETMDVINPATGEAFDTITLATEEEVNDAIEKSQQAQLE 
WERVPQPTE<AEHVKLLIPLLEKNRDEIAQLYVKEQGKTLAQAYGEIDKSISFIDYMTSLS 
MSDKGRVLQNSIANETIQIINKPIGVTAGIVPWNAPILVLMRKVIPAIVTGCSVVIKPSE 
ETTLLTLRLAELFRASTIPAGLFQIVPGTGETVGTQLASHKDIQLISLTGSMRAGKSVYE 
NAAQTVKKVNLELGGNAPVIVTSNADLDKAVDYIVTARINNAGQVCTCPERIFVHEDVHD 

40 DFLNKVTSKMKSLTVGDPFDENTDYGAIINQKQLDSIHEKVQDAIKNGATLMTGGHQLKR 
HGFFYAPTVLDNVRKDDNVFKDEIFGPVLAITTYRDIEQVIEDANDTNAGLSSYIF3ENL 
TEYMTATERLKFGEVYANCEAEEVVNGYHAGWRESGLGGADGIHGFEEYYNTTVSVIRY* 



45 Sequence 2957 

Con t ig_0 53 3_pos_8 5 0 1_7 722 
is similar to (with p-value 2.0e-98) 
>sp:sp( P25553|ALDA_ECOLI ALDEHYDE DEHYDROGENASE A {EC 1.2.1 
.22) (LACTALDEHYDE DEHYDROGENASE). 

50 gtgggttacggtgtagcacaagcaggtgctgaacgtatcgttcctacaacattagaacta 
ggtggtaaaagtgctaatattatctttgatgatgctaatttagagcaagtgattgaaggt 
gttcaacttggtatattatttaaccaaggtgaagtatgtagtgcaggttcaagattatta 
gtacaatcatctatttacgatgaattgttgccaaaattgaaagaagcatttgaaaatatt 
aaagttggcgatccatttgatgaagatactaaaatgagtgcgcaaactggaccagaacaa 

55 ttagataaaattgaaagttatataaaaattgctgaagaagatgacaaagcgaacatctta 
actggtggtcatcgaatcacagataacggcttagacaaaggttacttctttgagcctaca 
attatt gaga ttaacgataacaagcatcaacttgctcaagaagaaatctt egg tccagtt 
gtagtagttgaaaaattcgatgatgagcaagaagctatcgaaattgcaaatgattctgag 
tatggtttagctggaggtatcttcactacagatattcatcgtgcattaaatgt^gctaaa 
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gctatgagaacaggtcgtatttggattaatacttataatcaaattcctgctggtgcgcca 
ttcggaggatataaaaaatcaggtattgggcgcgaagtatataaagatgctatcaaaaac 
tatcaacaagttaaaaatatctttattgatacaagcaaccaaactaaaggtttatattaa 

5 Sequence 2958 

VGYGVAQAGAERIVPTTLELGGKSANIIFDDANLEQVIEGVQLGILFNQGEVCSAGSRLL 
VQSSIYDELLPKLKEAFENIKVGDPFDEDTKMSAQTGPEQLDKIESYIKIAEEDDKANIL 
TGGHRITDNGLDKGYFFEPTIIEINDNKHQLAQEEIFGPVVVVEKFDDEQEAIEIANDSE 
YGLAGGIFTTDIHRALNVAKAMRTGRIWINTYNQIPAGAPFGGYKKSGIGREVYKDAIKN 
10 YQQVKNI FI DTSNQTKGLY* 

Sequence 2959 
Con t i g_0 5 3 3_pos_0_l 310 
is . si-m] ar to (with p~vaiue 7.0e-62) 
15 >sp:sp!P40047|DHA3_YEAST ALDEHYDE DEHYDROGENASE, MITOCiiONDR 
lAL 3 PRECURSOR (EC 1.2.1.3). >gp: gpj U56605 I SCU56605_1 Sacch 
aroiuyces cerevisiae precursor aldehyde dehydrogenase gene, n 
uclear gene encoding mitochondrial protein, complete cds . NI 
D: gl336077. 

20 gtgtttaatatgtcattggaattgcccgtgatgctaacgattgtgttgttcttagcacta 
ggcatttttagtcaatggttagcgagtagaataaaatggccatcgattgttgtcatggcc 
atcgtaggtttacttgtaggacctatttttggattagcaaatccaaaagaggcacttgga 
cctgaggcatttagttcaattgtatctcttgctgtagcaattatattatttgaaggtagt 
agtaatctagattttagagaattaaaaggcatttctaaagctgttataagaattattaca 

25 ataggagcgggaattgcatggattttaggagcaatcgctttacatgtcactatgaatttc 
cctctgtctat ttcatttgttatcggagggctattcctaatcactggtccaaccgtgatt 
caacccttgctgaagcaagcgaaagtaaaaagaaatgtcgattcagtattaagatgggaa 
agtattatcttagatcctattggacctattattgcactcactgcattttatgtattccag 
atttttgaggaagggataggctttgtcgttattattctctttatcttgaagctcttagct 

30 gcaattttaataggttttggtgcagcctttctatttaattggcttataggtcaagataaa 
attcctcaaagtttaatgccacccattcagtttgtatttattcttttaacatttagtatc 
tgcgatgagatattatcagaatcaggtttactagctgtaacaatttttgggttaatgatg 
gcgcgtaaaaaacgtcacgatctcat ctttaaagaatcagaccactttatagataa t:gca 
tcatccatacttgtgagtactgtatttattttaattacgtcttctctcactaaagritgtg 

35 ttgctcaatgtgttatcttggcagctcatactctttagtttggttatgattgtattagta 
agaccaatttcagtgtttctttcaacattaggtactgaaataactaaaaaggaacgtgca 
gtagtagcactaatggcgcctagaggtattgtggttttaacagtggcacaattcttctca 
agtttatttatggacgataaaattcctatggctcaatatattacaccagtaacttttggt 
cttgtatttatcactgtagtcatttatggatttggttttacacctttaagtaaactgttc 

40 ggtgtagcaagtacggagccaccaggcgtaatcattgtcggagaaagtgaattttcgttc 
catcttggtattaatctaagggatcatggtatacccgtcatgatgttcaa 

Sequence 2960 

VFNMSLELPVMLTIVLFLALGIFSQWLASRIKWPSIVVMAIVGLLVGPIFGLANPKEALG 
45 PEAFSSIVSLAVAIILFEGSSNLDFRELKGISKAVIRIITIGAGIAWILGAIALHVTMNF 
PLSISFVIGGLFLITGPTVIQPLLKQAKVKRNVDSVLRWESIILDPIGPIIALTAFYVFQ 

IFEEGIGFWIILFILKLLAAILIGFGAAFLFNWLIGQDKIPQSLMPPIQFVFILLTFSI 
CDEILSESGLLAVTIFGLMMARKKRHDLIFKESDHFIDNASSILVSTVFILITSSLTKDV 
LLNVLSWQLILFSLVMIVLVRPISVFLSTLGTEITKKERAWALMAPRGIVVLTVAQFFS 
50 SLFMDDKIPMAQYITPVTFGLVFITWIYGFGFTPLSKLFGVASTEPPGVIIVGESEFSF 
HLGINLRDHGI PVMMFX 

Sequence 2961 
Cont:;g_0534_pos_74 90_6273 
55 is similar to (with p-value 3.0e-99) 

>sp:sp|P393l2|CYCA_ECOLI D-SERINE/D-ALANINE/GLYCINE TRANSPO 
RTER. >pir:pir|S56433|S56433 hypothetical protein o470 - Esc 
herichia coli >gp:gpl U14003 I ECOUW93_120 Escherichia coli K-1 
2 chromosomal region from 92.8 to 00.1 minutes. NID: gl26317 
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2. >gp:gp|AE0004 92|AE0004 92_4 Escherichia coli K-12 MG1655 s 
ection 382 of 400 of the complete genome. NID: gl790649. 

atgttcatgagagctatgggagaattactgttatccaatttaggatttaaatcgtttggt 
gacattgctcatcatcatattggttctatggcaggttttatggtggggtggacatattgg 
5 ttaacatggattatttcaggaatggcagaagtgactgctgttgccaagtatgtttccttc 
tggtatccaacaattccaaactggttaacagctgcagcgactattttagttttagt tgct 
ttaaatctattcagtgctaaattatttggagaattagaattttggctatotattattaaa 
gttttgactattttagctttgatagccgttggtgttgttatgattgtatttggaatgaag 
acaagctatggccctgcaacggtaacgaatatatggaaagacggaggctttttccctaat 

10 ggtgcacaaggtttcttcatgtcattccaaatggcaattttctcatttattggtattgag 
ttgattggaataactgcaggggagactaaagatcctcacaaaacaattcctcaagcaatt 
aataatgtaccgtttagaatattattattttatataggatcgttggcagttatcatgtct 
gttgtaccatggcaacaattgaatcctgctgacagtccatacgttaaaatgtttggatta 
gttggaatcccttttgcagcaggtattattaactttgttgtacttacagctgcagcctct 

15 tcttgtaatagtggtatatttgctaatagccgtacgatgtttggattagctggaagaaag 
caaggtccagcattcttacatagaaccaataagcacggcgtaccacattatgctatttta 
gtgacatgtggcttattaagtatttcagtcgtgttaaatgcaatttttaaagatgcgact 
aaagtgttcgtacaaattacaacattttcaactgttttaaatattatgatttggacaatt 
attatgatcgcgtatctaggttatttaagacatgaaccgaaacagcataaagaaagtaac 

20 tataaaatgtggggcggaaaatacatggcttacagtattttagggttctttgcatttatt 
tttattatactattgattaatagtgcaacgcgttatgccgtactttctgcacccgtatgg 
tttgttatcatgctattgatgtatcaaaaatataaaaaagaatctcgcaaagctaaaatt 
aaaaatgaggaagagtaa 

25 Sequence 2962 

MFMRAMGELLLSNLGFKSFGDIAHHHIGSMAGFMVGWTYWLTWIISGMAEVTAVAKYVSF 
WYPTIPNWLTAAATILVLVALNLFSAKLFGELEFWLSIIKVLTILALIAVGVVMIVFGMK 
TSYGPATVTNIWKDGGFFPNGAQGFFMSFQMAIFSFIGIELIGITAGETKDPHKTIPQAI 
NNVPFRILLFYIGSLAVIMSVVPWQQLNPADSPY.VKMFGLVGIPFAAGIINFWLTAAAS 

30 SCNSGIFANSRTMFGLAGRKQGPAFLHRTNKHGVPHYAILVTCGLLSISVVLNAIFKDAT 
KVFVQITTFSTVLNIMIWTIIMIAYLGYLRHEPKQHKESNYKMWGGKYMAYSILGFFAFI 
FIILLINSATRYAVLSAPVWFVIMLLMYQKYKKESRKAKIKNEEE* 

Sequence 2963 
35 Cont ig_0 5 3 9_pos_0_l 5071 

is similar to (with p-value 8.06-35) 

>gp:gp|X81475|MHLMP_l M.hominis Impl and lmp2 genes. NID: g 
587470. 

atgggcacgttaaaatcattagttgctaaacaacctacagtacaaaaaacaagtgtttat 

40 attaacgaagatcaacctgagcaatctgcctacaatgattccattacaatgggacaaact 
ataattaataaaacagctgatccagtacttgataaaactttagttgataacgcaatcagt 
aacatttcaactaaagagaatgcactgcatggtgaacaaaaattaacaactgctaaaacg 
gaagcaattaatgcacttaatacattagctgatttaaacacacctcagaaagaggctatt 
aaaacagctattaacactgctcatacaagaactgatgtaactgcagagcaaagtaaggct 

45 aatcaaataaatagtgcaatgcacacgttgagacaaaacatttctgacaacgaatcagta 
acaaacgaaagtaattatattaacgctgaacccgaaaaacaacatgcctttactgaggct 
ctaaataatgctaaagaaatagttaatgaacaacaagccactcttgatgccaattcaatt 
aaccaaaaagcacaagcgattcttactactaaaaatgctttagatggtgaagaacaatta 
cgtcgtgctaaagaaaatgccgatcaagaaatcaatacgttaaatcaattgactgatgcg 

50 caaagaaatagtgaaaaaggtttagtcaacagttctcaaactagaacagaagttgcttct 
caattagcaaaagctaaagaactaaataaggtgatggaacaactgaataaccttatcaat 
ggtaaaaaccaaatgataaatagcagtaaatttatcaatgaagatgcgaaccaacaacaa 
gcatattcaaatgcgattgcaagtgcagaagtgcttaaaaacaaatcacaaaaccctgaa 
ttagataaagtaacaattgaacaagcaattaataatattaattctgcaattaataatcta 

55 aacggtgaagctaaactgactaaagctaaagaagatgctgttgcttcaataaacaaccta 
agcggattaactaacgagcaaaaaacaaaagaaaatcaagccgttaatggctctcaaact 
agagaccaagttgctaacgttttacgtgattcaaaggcattagatcaatctatgcaaaca 
ttacgtgatttagttaacaatcaaaatgtaatacattcaacaagtaattattttaacgag 
gattcaactcaaaagaatacttatgataatgcaattgataatggctcgacatatataact 
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ggtcaacacaattcagaattaaataaatctactattgatcaaacgattagccaaattaac 
acagctaaaaatgatttacatggtgcagaaaagttacaaagagataagggaactgctaat 
caagaaattggacaattaggttatttaaatgaccctcaaaaatctgctgaggaatcctta 
gtcaacggttcaaatacacgttctgaagtagaagagcatcttaatgaagctaaatcatta 
5 aataatgcaatgaaacaattaagagataaagtagctgaaaagactaatgtcaaacaaagt 
agcgattacattaatgattcaactgaacatcaacgtgggtatgatcaagcacttcaagaa 
gcagaaaatattattdatgdaatcggtaatccaacattdaataaatcggaaattgaacaa 
aagttacaacaattgactgacgctcaaaatgcgttacaaggttcacatctattagaagaa 
gctaaaaataatgcgattactgaaatcaataaacttacagcattaaatgatgcacaacgt 

10 caaaaagcaattgaaaatgttcaagcacagcagacaatcccagcagttaatcaacaatta 
actttggatagagaaataaatactgcaatgcaagctttacgagataaagtaggccaacaa 
aataacgttcaccaacaaagtaattatttcaatgaagatgaacaaccaaaacataactat 
gataattctgtacaagccggtcaaactattattgataaacttcaagatccaatcatgaac 
aaaaatgaaattgagcaggctattaatcaaatcaatacgactcaaacagcgttaagtgga 

15 gaaaataaattacacactgaccaagaaagcacaaatagacaaatagaaggtttatctagt 
ttgaacacagctcaaatcaacgccgaaaaagatttagtcaatcaagctaaaacaaqaaca 
gaugt cqctcaaaagttagctacagctaaagaaataaattctgctatgagtaatt ;aaga 
gatggcattcaaaataaagaggacatcaaacgtagcagtgcatatatcaacgcagatccg 
actaaagttacagcttacgatcaagcactacagaacgcagaaaatatcatcaatgccaca 

20 ccaaacgtagagcttaataaagctacaattgaacaagcgctatcacgcgttcaacaagca 
caacaagatcttgatggtgttcaacaattagctaatgctaaacaacaagctacacaaact 
gtcaatgggttaaatagcttaaatgacggtcaaaagcgtgaattaaatctattaattaat 
tcagctaatacccgtacaaaagtacaagaagaattaaacaaagcaactgaatcgaaccat 
gcgatggaagctttaagaaacagtgttcaaaacgttgatcaagtaaaacaaagtaacaat 

25 tatgtcaatgaagatcaacctgaacagcacaattatgataatgctgtcaatgaagctcaa 
gctacaatcaacaacaatgctcaacctgttctagacaaattagctatagaacgtttaact 
caaactgttaacactacaaaagatgcattacatggtactcaaaaactgatacaagaccaa 
caagctgctgaaactggaatacgtggtttaacgagtctcaatgaacctcagaaaaatgct 
gaagtagctaaagtaactgcagcaacaacacgtgatgaagtgagaaatattcgtcaagaa 

30 gcaacaacattagatactgcaatgcttggtttacgtaaaagcattaaagataaaaacgat 
actaaaaatagtagtaaatatattaatgaggatcatgaccaacaacaagcttatgacaat 
gctgtaaataatgctcaacacgttatcgatgaaactcaagcaacgttaagctcagataca 
atcaatcaattggcaaatgccgtaactcaagctaaatctaatcttcatggagatactaaa 
ctacaacacgataaagatagtgctaaacaaacgattgctcaattacagaatttgaattca 

35 gctcaaaaacatatggaagattctttaattgataatgaatctacacgtacgcaagtccaa 
cacgatttaacagaagctcaagctttagatggtttaatgggtgccttaaaagaaaqtatt 
aaagatr-dtactaatattgtttcaaacggtaattacatcaatgcggaaccatctaagaaa 
caagcatatgatgcagctgtacaaaatgctcaaaatataataaatggaacgaatcaacca 
acaattaataaaggtaatgtcactacagcaacacaaaccgtgaaaaatactaaagatgcc 

40 ttagacggtgatcatagattagaggaagctaaaaataatgccaatcaaacaatcagaaat 
ctatctaatttgaacaatgcccaaaaagatgcagagaaaaatctagttaatagcgcatca 
acattagaacaagttcaacaaaacttacaaaccgctcaacaattagataatgctatgggt 
gagttacgacaaagtattgctaacaaagatcaagtgaaagcagatagtaaatatctaaat 
gaagatcctcaaattaagcaaaactatgatgatgcagttcaacgtgttgaaactattatt 

45 aacgaaactcaaaaccctgaattacttaaagcaaacattgaccaagcaactcaatccgtt 
caaaatgcagaacaagctttacatggtgctgaaaaattaaatcaagacaaacaaacgtct 
tcgacagaactagatggattaacagatttaacagatgcacaacgtgaaaaactcagagaa 
caaattaacacttctaatagtagagatgatattaagcaaaaaattgagcaagcaaaagca 
ctaaatgacgcaatgaaaaaacttaaagaacaagttgcgcaaaaagatggtgttcatgct 

50 aacagtgattatacaaatgaagattctgcacaaaaagatgcgtataataatgcacttaaa 
caagcggaagacattattaataacagctcaaatcctaacttaaatgcacaagacattact 
aatgctttaaataatattaaacaagcacaagataaccttcatggagctcaaaaattacag 
caagacaaaaatacaactaatcaagccattggtaacttaaatcatcttaatcaacctcaa 
aaagatgcgcttatacaagctattaatggagctacatctagggaccaagttgcagaaaaa 

55 cttaaagaggccgaagcgcttgatgaagctatgaaacaacttgaagatcaagtgaatcaa 
gat J";*:ccaatttcaaatagcagcccattcataaatgaagactcagacaaacaaaff3act 
tataatgataaaatccaagctgcaaaagaaataattaatcaaacatctaatccaacctta 
gataaacaaaaaattgctgatacacttcaaaatattaaagatgcagtgaataatttacat 
ggtgatcaaaaattagctcaatctaaacaagacgctaataatcaattaaatcatttagat 
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gacttaaccgaagaacaaaaaaaccattttaaaccgttaattaataatgctgatactcga 
gatgaggtaaataaacaactagagattgctaaacaattaaatggtgatatgagtacactt 
cataaagtcataaatgataaagatcaaattcaacatttaagcaattacattaatgctgat 
aatgataaaaaacaaaattatgataatgctattaaagaagctgaggatttaattcataat 
5 catccagatacattaga teat aaagcattacaagatt tat taaacaagatagaccaagcg 
cataacgaattaaatggagaatccagatttaaacaggctttagacaatgctttaaacgac 
atagatagcttaaacagtctcaatgttccacaacgtcaaactgttaaggataacatcaac 
catgtgacaactctagaaagtttagctcaagaattgcagaaagcaaaagagcttaa'^gat 
gctatg^:aagcaatgagagatagcattatgaatcaagagcaaattcgtaaaaata.5caat 

10 tatactaatgaagacttagctcaacaaaatgcctataatcatgcagtagataatataaat 
aacattattggtgaagacaatgcgacgatggatcctcaaataatcaaacaagcaactcaa 
gatataaatacagctataaatggattaaatggagatcaaaaacttcaagatgcaaagaca 
gatgctaaacaacaaattactaactttactggtttaactgaaccacaaaaacaagcattg 
gaaaacatcattaaccaacaaacaagtagagcaaatgttgctaaacagttaagtcatgct 

15 aaattcttaaatggaaaaatggaagaattaaaagttgcagtagccaaagcgtcattagta 
agacaaaatagtaactatattaatgaagatgtctctgaaaaagaagcatatgaacaagct 
attgcaaaaggtcaggaaataattaattcagaaaataatccaacaataagtagtactgat 
atcaatcgtaccattcaagaaattaatgatgctgaacaaaatcttcatggtgaaaataaa 
ttaagacaagcacaggaaattgcaaagaatgaaatacaaaatctagacggattaaattca 

20 gctcaaataacaaaattaatccaagatataggcagaacaacaactaaacctgcagtaact 
cagaaactagaagaagcaaaagcaataaaccaagctatgcaacaacttaaacaaagtata 
gccgataaggatgctactctaaattctagtaactatctcaatgaagattctgagaaaaag 
ttagcgtacgataatgctgtaagccaagctgaacaactcataaatcaacttaacgaccca 
actatggatataagtaatattcaagctattactcaaaaggtcattcaagcaaaagattca 

25 ttgcacggtgcgaataaacttgcacaaaatcaagcagattcaaatttaataataaatcaa 
tcaacaaatttaaatgataaacaaaagcaagcattaaatgacttaattaatcatgctcaa 
actaaacagcaagtggcagaaataattgcacaagctaataagttaaataacgaaatgggc 
acuc;-. af.r.aacactcgtagaagaacagtcaaacgttcatcaacaaagtaaatata*; caat 
gaagatccgcaagttcaaaatatttataatgactccattcaaaaaggtcgagaaatatta 

30 aacggcactacagatgatgttttaaacaacaataaaatagcagatgccattcaaaacatt 
catttaactaaaaacgatttacatggtgatcaaaaattacaaaaagcacaacaagatgca 
accaatgaattaaactatttaacaaatctaaacaattctcaaagacaaagcgagcatgat 
gagattaactctgctccttcaagaactgaagtttctaatgatttaaatcatgctaaagca 
cttaatgaagctatgcgtcaacttgagaatgaagttgctcttgaaaacagtgttaaaaaa 

35 ttaagcgactttatcaatgaagatgaagcggcacaaaatgaatatagtaatgcacttcaa 
aaagctaaagacattatcaacggcgttccaagtagcactttagataaagctacaattgaa 
gatgctttattagaattgcaaaatgctagagaaagtttacatggtgagcaaaaacttcaa 
gaggctaaaaatcaagctattgctgaaattgataatttacaagcattaaatcctggacag 
gttcttgctgaaaaaacattagttaaccaagcatcaaccaaaccagaagttcaagaagcc 

40 ttacaaaaagcaaaagaacttaatgaagctatgaaagcactgaaaactgaaataaataaa 
aaagaacaaatcaaggctgatagtagatatgtaaatgctgacagtggtcttcaagcaaat 
tacaattctgcgttaaattatggttctcaaattattgcaactacccaaccaccagagctt 
aataaagatgtaataaatagagcaactcaaacgattaaaactgctgaaaataatttaaat 
gggcaatctaaattagcagaggctaagtcagacggaaatcaaagcatcgaacatttgcaa 

45 ggattaacacaatcacaaaaagataaacaacatgatttaattaatcaagctcaaactaaa 
caacaggtagatgatatcgtaaataactctaaacaattagataactctatgaatcaacta 
caacaaattgttaacaatgacaatacagtaaaacaaaatagtgatttcattaatgaagat 
tccacccaacaggatgcttataatcatgcaattcaagcagcaaaagatttgataa-jtgct 
catcctactatcatggataaaaatcaaatagatcaagctattgaaaatatcaaacaagca 

50 cttaatgatttacacggtagtaataaactatcagaagataaaaaagaagcttcagaacaa 
ctacaaaaccttaatagcttgacgaacgggcaaaaagatacgattttaaatcatattttc 
agtgcaccaacaagaagccaagtaggagaaaaaattgcaagtgctaaacaattaaataat 
acaatgaaagcacttagagattctattgctgataataatgaaattttacaaagtagtaag 
tacttcaatgaagattctgaacaacaaaatgcttataatcaagccgtaaataaagctaaa 

55 aatataattaatgatcaaccaacaccagtaatggcaaatgatgagattcaaagtgtccta 
aatgaagttaaacaaactaaagataatttacatggtgatcaaaaacttgctaacgacaag 
acagatgcccaagcaacattaaatgcgttaaattacttaaatcaagcgcaaagaggtaat 
cttgaaactaaagttcaaaactctaattctagaccagaagtacaaaaagtagttcaatta 
gcaaatcaacttaatgatgcgatgaaaaaattagatgatgctttaactggtaatgacgca 
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ataaaacaaacgagtaattatattaatgaagatacttctcaacaagttaactttgatgag 
tatacagatagaggtaaaaacatagttgctgaacaaacaaatccaaatatgtctccaact 

aatattaacactattgctgataaaattactgaagctaaaaatgatttacatggcgtacaa 
aaactagaacaagctcaacaacagtccatcaatactattaatcaaatgactggtctaaac 
5 caagctcaaaaagaacaattaaatcaagaaattcaacaaactcaaacccgttctgaagta 
catcaagtaattaataaagcacaagctttaaatgattcaatgaatactttacgtcaaagt 
attactgatgaacatgaagttaaacaaacaagtaactacatcaatgaaactgttggtaat 
caaactgcatataacaatgccgttgatcgtgtaaaacaaataatcaatcaaacatctaat 
ccaactatgaatcctttagaggtggaacgtgcaacatcaaatgtaaaaacttctaaagat 

10 gcacttcatggtgaacgtgaattgaatgacaataaaaattcaaaaacttttgcagtcaat 
cacttagataacctcaatcaagctcaaaaagaagcattaactcatgaaattgaacaagca 
actatagtttcacaagtaaataatatctataacaaagcgaaagctttaaataatgatatg 
aaaaaacttaaagatatcgttgctcaacaagataatgtgagacaatcaaacaattatata 
aacgaggatagtacacctcaaaatatgtacaacgatacaattaatcatgcacaatcaatc 

15 attgatcaagtagcaaaccctacgatgtctcatgacgaaatagagaatgcaatcaataac 
ataaagcatgccatcaatgcactcgatggagaacataaattacaacaagcaaaagaaaat 
gcaaacttattgattaatagtttaaacgatttaaatgcaccacaaagagatgccataaat 
agattggttaatgaagctcaaacaagagaaaaagtagctgaacaacttcaaagtgctcaa 
gctttaaatgacgctatgaagcatttaagaaacagcattcaaaat caatcatccgt aaga 

20 caagc.gagcaaatatattaatgcaagtgatgctaaaaaagagcaatataatcacgcagtt 
agagaagtcgaaaatattatcaatgaacaacatccaacattggataaagaaataattaag 
caactaacggatgctgtaaatcaagcgaataatgacttaaatggcgttgaattattagat 
gctgataagcaaaacgcacatcaatcgatacctacattgatgcacttaaatcaagcacaa 
caaaacgcattaaatgaaaaaattaataacgcagttaccagagctaaagttgcggctatt 

25 attggccaagcaaaaatactcgatcatgctatggagaatttagaagaaagtatcaaagat 
aaagagcaagtcaaacagtcaagtaactatattaatgaagaccctgatgttcaagaaaca 
tacaataacgccgttgatcatgtgacagaaatacttaatcaaacagtaaatccaacttta 
tctattgaagatatagagcatgctatcaacgaagttaatcaagcgaaaaaacaactcaga 
ggtaaacaaaaactttatcaaactatcgatttagctgataaagaattaagtaaattggat 

30 gatttaacatcacaacaaagcagttcaatatctaatcaaatatatactgctaaaacgaga 
acagaagttgcccaagcaattgaaaaagcaaaatcattaaatcatgcaatgaaagcactt 
aacaaaatatataaaaatgcagataaagtgttagatagtagtcgattcattaacgaagat 
caacctgaaaaagaggcgtatcaacaagctataaatcatgttgattcaatcattcataga 
caaacaaatcctgaaatggatccaacagtaatcaatagcataactcatgaactcgaaaca 

35 gctcaaaataacttacatggtgatcagaaacttgctcatgcaaaacaagatgccgctaat 
gtaattaatggtctaattcatcttaatgttgctcaacgcgaggtaatgataaatacgaat 
acaaatgctacaacacgcgaaaaagttgcaaagaacttagataatgctcaagctcttgat 
aaagctatggaaacactacaacaagtagttgctcataaaaataatatattgaacgatagt 
aaritstttaaatgaagattcaaaatatcaacaacaatacgatcgagttattgctgatgcc 

40 gaacaactacttaatcagacaacaaatccaacattagaaccttataaagtcgatattgtt 
aaggataatgtcctagctaacgaaaaaatactatttggcgcagaaaaactatcatatgac 
aaatcaaatgcaaatgatgaaattaaacatatgaattatcttaataatgcacaaaagcaa 
tctataaaagatatgatttctcacgcagcattaagaactgaagttaaacaacttctgcaa 
caagctaaaacccttgatgaagctatgaaatcacttgaagataaaactcaagtagtgatt 

45 acaga tact act ttgcctaattacactgaagcttcagaggataaaaaggaaaaagtagac 
caaactgtatcacatgctcaagcaatcattgataaaataaatggctcaaatgtaagttta 
gatcaagtacgacaagcactagaacaattaactcaagcatcagaaaacctcgatggtgat 
cagcgagttgaagaagctaaagttcatgctaatcaaacaattgaccaattaacacatctt 
aattcattacaacaacaaactgcgaaagaaagtgttaaaaacgcaacaaaactagaagaa 

50 atcgctactgctagtaacaatgctctggcattaaacaaagtaatgggtaaattagaacaa 
ttcattaatcatgctgattctattgaaaatagtgataattatagacaagccgacgacgac 
aaaattatcgcttatgatgatgcactagaacatggacaagatatacaaaaatctaacgca 
acccaaaatgaagcaaaacaagcgttacaacaattaataaatgcagaaacatcgttaaat 
ggtttcgaaagattaaatcatgctagaccacgagctttagaatatattaaatcactagaa 

55 aaaataaacaatgctcaaaagtctgctttagaggataaagtaacgcaatcgcatgattta 
ttagaattagaacatcttgtcaacgagggcacaaacctcaatgacattatgggtgaatta 
gctaacgcaatcgttaataactatgctccaaccaaagcaagtataaattatattaacgcc 
gataacctacgcaaagataactttactcaagctatcaacaatgcacgtgatgcactcaac 
aaaao*:cc;aggtcagaacttagatttcaatgcaattgatacatttaaagatgatai:3ttc 
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aaaactaaagatgcacttaacggtattgaacgtttaacagctgcaaaatcaaaagcagaa 
aaactaattgatagtttaaaatttattaataaagctcaattcacacatgcaaatgatgaa 
attatgaatactaattctattgcacaattgtctagaatcgtgaatcaagcatttgattta 
aatgatgcaatgaaatctttaagagatgaacttaataatcaagcttttcctgtccaagca 
5 agctcaaattatataaattcagatgaagatttaaaacaacaatttgaccatgctttaagt 
aat get cgaaaag tact tgcaaaagaaaatggtaaaaatttagatgaaatacaaattgag 
ggactcaaacaagtgattgaggatactaaagatgctttaaatggtatccaacgtttatca 
aaagctaaagctaaagcaattcaatacgtacaatctttatcttatatcaatgatgcacag 
cgtcatattgctgaaagtaatattcacaactctgatgatttatcatctttagcaaataca 

10 ttatctaaagctagtgatttagataatgcaatgaaagacttacgagatactctagaaagt 
aattcaacttctgttccaaatagtgtgaattatattaatgctgataagaatttacaaatt 
gaatttgatgaggcgctacaacaagcaagtgcaacaagttctaaaacttcagaaaatcca 
gcaacgattgaagaagtattaggtcttagtcaagccatttacgatacaaaaaatgcattg 
aatggtgaacaacgtcttgcaactgagaagagcaaagatttaaaattaataaaaggatta 

15 aaagatttaaataaagcacaacttgaagatgtcacaaacaaggtaaattcagcaaatact 
ttaacagagttatctcagctcactcaatcaacgttaaaattaaacgataaaatgaaatta 
ttgagagataagcttaaaaccttagtaaatcctgttaaagcaagtttaaattatagaaac 
gctgattataatttaaaacgtcaatttaacaaagctttaaaagaagctaaaggcgtatta 
aataaaaatagcggtacaaatgtcaatatcaatgacattcaacatcttttaacacaaata 

20 gataatgctaaagaccaattaaatggtgaacgacgtctaaaagaacatcaacaaaaatct 
gaagtatttattattaaagaattagatatacttaataatgctcaaaaagctgcaataatt 
aatcagattagagcgtctaaagacattaaaataattaatcaaatcgttgataatgcaata 
gaattaaatgatgctatgcaaggtttaaaagaacatgtagctcaattaacagcaactaca 
aaagacaacattgaatatttaaatgctgatgaagaccttaaaatacaatatgattacgct 

25 atcaacttagcgaataatgttcttgacaaagaaaacggtacaaataaagacgctaatatc 
ataattggaatgattcaaaacatggatgatgctagagcacttctaaatggaattgaaaga 
cttaaagatgctcaaacaaaagcacataatgacattaaagatacgctcaaacgtcaactt 
gatgaaattgaacacgctaatgcaacatcaaattctaaagctcaagcaaaacaaatggta 
aatgaggaagctagaaaagcgttttctaatattaatcacgcaacatcaaatgatttagtt 

30 aatcaagcaaaagatgaagggcaatctgcaattgaacacatacatgcagatgaattacct 
aaug-.:afkcLactagatgctaatcaaatgattgaccaaaaagttgaagatataaatc,v»ctta 
attagtcaaaatccaaacttatcaaatgaagaaaaaaataaactaatatctcaaactaat 
aagttagtaaatggaattaagaatgaaattcaacaagctataaacaaacaacaaatagaa 
aatgctacaacaaaactagatgaagtcattgaaactactaaaaaattaattatcgccaaa 

35 gcagaagctaaacaagtgataaaagagttatcacaaaagaaacgagatgcaataaataac 
aacactgatttaacaccttctcaaaaggcacatgctttagcagatattgataaaacagaa 
aaagatgcacttcaacataticgaaaattctaattcaattgatgatatcaataacaataaa 
gagcatgcatttaatactttagctcatatcattatttgggatactgatcagcaaccatta 
gtttttgaactacctgaattgagccttcaaaatgctctagtaacaagtgaggtggttgtt 

40 cacagagatgaaactatttcattagaatctataattggagctatgactttaactgatgaa 
cttaaagtcaatattgtttcattaccgaacactgataaagtagctgatcacctaaccgct 
aaagttaaggttattttagctgatggctcatttgtcactgtaaatgttccagtcaaagtt 
gtagaaaaagaattacaaatagctaaaaaggatgctataaaaacaattgatgttctggta 
aaacaaaaaatcaaagatatagattctaataacgaattaacgtctactcaacgtgaagat 

45 gcaaaagctgaaattgaaagattgaaaaagcaagccatcgataaagtgactcattctaaa 
tcgattaaagatattgaaacagtaaaacgaactgattttgaagaaatagatcagtttgat 
cctaaacgctttacgctaaataaagctaaaaaggatatcattactgatgttaatactcaa 
atccaaaatggtttcaaagaaattgaaacaataaaaggtttaacttctaatgaaaaaact 
cagtttgataaacaattaactgcactacaaaaagaatttttagaaaaagtcgagcatgct 

50 cataatttagtagaattaaatcaattacaacaagagtttaataatagatatgaacatatt 
ttaavtcc«-iagcacatttactaggtgaaaaacatatagcagaacataaattaggat.: tgtt 
gtagtaaacaaaactcagcaaatactaaataatcaatctgcttcttactttataaaacaa 
tgggcacttgatagaattaaacaaattcaactagaaacgatgaattcaattcgtggtgcg 
cataccgtaca 

55 

Sequence 2964 

MGTLKSLVAKQPTVQKTSVYINEDQPEQSAYNDSITMGQTIINKTADPVLDKTLVDNAIS 
NISTKENALHGEQKLTTAKTEAINALNTLADLNTPQKEAIKTAINTAHTRTDVTAEQSKA 
NQINSAMHTLRQNISDNESVTNESNYINAEPEKQHAFTEALNNAKEIVNEQQATLDANSI 
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NQKAQAILTTKNALDGEEQLRRAKENADQEINTLNQLTDAQRNSEKGLVNSSQTRTEVAS 
QLAKAKELNKVMEQLNNLINGKNQMINSSKFINEDANQQQAYSNAIASAEVLKNKSQNPE 
LDKVTIEQAINNINSAINNLNGEAKLTKAKEDAVASINNLSGLTNEQKTKENQAVVGSQT 
RDQVANVLRDSKALDQSMQTLRDLVNNQNVIHSTSNYFNEDSTQKNTYDNAIDNGSTYIT 
5 GQHNSELNKSTIDQTISQINTAECNDLHGAEKLQRDKGTANQEIGQLGYLNDPQKSAEESL 
VNGSNTRSEVEEHLNEAKSLNNAMKQLRDKVAEKTNVKQSSDYINDSTEHQRGYDQALQE 
AENIINEIGNPTLNKSEIEQKLQQLTDAQNALQGSHLLEEAKNNAITEINKLTALNDAQR 
QKAIENVQAQQTIPAVNQQLTLDREINTAMQALRDKVGQQNNVHQQSNYFNEDEQPKHNY 
DNSVQAGQTIIDKLQDPIMNKNEIEQAINQINTTQTALSGENKLHTDQESTNRQIEGLSS 

10 LNTAQINAEKDLVNQAKTRTDVAQKLATAKEINSAMSNLRDGIQNKEDIKRSSAYINADP 
TKVTAYDQALQNAENIINATPNVELNKATIEQALSRVQQAQQDLDGVQQLANAKQQATQT 
VNGLNSLNDGQKRELNLLINSANTRTKVQEELNKATESNHAMEALRNSVQNVDQVKQSNN 
YVNEDQPEQHNYDNAVNEAQATINNNAQPVLDKLAIERLTQTVNTTKDALHGTQKLIQDQ 
QAAETGIRGLTSLNEPQKNAEVAKVTAATTRDEVRNIRQEATTLDTAMLGLRKSIKDKND 

15 TKNSSKYINEDHDQQQAYDNAVNNAQHVIDETQATLSSDTINQLANAVTQAKSNLHGDTK 
LQHDKDSAKQTIAQLQNLNSAQKHMEDSLIDNESTRTQVQHDLTEAQALDGLMGALKESI 
KDNTNIVSNGNYINAEPSKKQAYDAAVQNAQNIINGTNQPTINKGNVTTATQTVKNTKDA 
LDGDHRLEEAKNNANQTIRNLSNLNNAQKDAEKNLVNSASTLEQVQQNLQTAQQLDNAMG 
ELRQSIANKDQVKADSKYLNEDPQIKQNYDDAVQRVETIINETQNPELLKANIDQATQSV 

20 QNAEQALHGAEKLNQDKQTSSTELDGLTDLTDAQREKLREQINTSNSRDDIKQKIEQAKA 
LNDAMKKLKEQVAQKDGVHANSDYTNEDSAQKDAYNNALKQAEDIINNSSNPNLNAQDIT 
NALNNIKQAQDNLHGAQKLQQDKNTTNQAIGNLNHLNQPQKDALIQAINGATSRDOVAEK 
LKEAEALDETU^KQLEDQVNQDDQISNSSPFINEDSDKQKTYNDKIQAAKEIINQTS'NPTL 
DKQKIADTLQNIKDAVNNLHGDQKLAQSKQDANNQLNHLDDLTEEQKNHFKPLINNADTR 

25 DEVNKQLEIAKQLNGDMSTLHKVINDKDQIQHLSNYINADNDKKQNYDNAIKEAEDLIHN 
HPDTLDHKALQDLLNKXDQAHNELNGESRFKQALDNALNDIDSLNSLNVPQRQTVKDNIN 
HVTTLESLAQELQKAKELNDAMKAMRDSIMNQEQIRKNSNYTNEDLAQQNAYNHAVDNIN 
NIIGEDNATMDPQIIKQATQDINTAINGLNGDQKLQDAKTDAKQQITNFTGLTEPQKQAL 
ENIINQQTSRANVAKQLSHAKFLNGKMEELKVAVAKASLVRQNSNYINEDVSEKEAYEQA 

30 lAKGQEIINSENNPTISSTDINRTIQEINDAEQNLHGENKLRQAQEIAKNEIQNLDGLNS 
AQITKLIQDIGRTTTKPAVTQKLEEAKAINQAMQQLKQSIADKDATLNSSNYLNEDSEKK 
LAYDNAVSQAEQLINQLNDPTMDISNIQAITQKVIQAKDSLHGANKLAQNQADSNLIINQ 
STNLNDKQKQALNDLINHAQTKQQVAEIIAQANKLNNEMGTLKTLVEEQSNVHQQSKYIN 
EDPQVQNIYNDSIQKGREILNGTTDDVLNNNKIADAIQNIHLTKNDLHGDQKLQKAQQDA 

35 TNELNYLTNLNNSQRQSEHDEINSAPSRTEVSNDLNHAKALNEAMRQLENEVALENSVKK 
LSDFINEDEAAQNEYSNALQKAKDIINGVPSSTLDKATIEDALLELQNARESLHGEQKLQ 
EAKNQAIAEIDNLQALNPGQVLAEKTLVNQASTKPEVQEALQKAKELNEAMKALKTEINK 
KEQIKADSRYVNADSGLQANYNSALNYGSQIIATTQPPELNKDVINRATQTIKTAENNLN 
GQSKLAEAKSDGNQSIEHLQGLTQSQKDKQHDLINQAQTKQQVDDIVNNSKQLDNSMNQL 

40 QQIVNNDNTVKQNSDFINEDSSQQDAYNHAIQAAKDLITAHPTIMDKNQIDQAIENIKQA 
LNDLHGSNKLSEDKKEASEQLQNLNSLTNGQKDTILNHIFSAPTRSQVGEKIASAKQLNN 
TMKALRDSIADNNEILQSSKYFNEDSEQQNAYNQAVNKAKNIINDQPTPVMANDEIIQSVL 
NEVKQTKDNLHGDQKLANDKTDAQATLNALNYLNQAQRGNLETKVQNSNSRPEVQKVVQL 
ANQLNDAMKKLDDALTGNDAIKQTSNYINEDTSQQVNFDEYTDRGKNIVAEQTNPNMSPT 

45 NINTIADKITEAKNDLHGVQKLEQAQQQSINTINQMTGLNQAQKEQLNQEIQQTQTRSEV 
HQVINKAQALNDSMNTLRQSITDEHEVKQTSNYINETVGNQTAYNNAVDRVKQIINQTSN 
PTMNPLEVERATSNVKTSKDALHGERELNDNKNSKTFAVNHLDNLNQAQKEALTHEIEQA 
TIVSQVNNIYNKAKALNNDMKKLKDIVAQQDNVRQSNNYINEDSTPQNMYNDTINHAQSI 
IDQVANPTMSHDEIENAINNIKHAINALDGEHKLQQAKENANLLINSLNDLNAPQRDAIN 

50 RLVNEAQTREKVAEQLQSAQALNDAMKHLRNSIQNQSSVRQESKYINASDAKKEQYNHAV 
REVENIINEQHPTLDKEIIKQLTDAVNQANNDLNGVELLDADKQNAHQSIPTLMHLNQAQ 
QNALNEKINNAVTRAKVAAIIGQAKILDHAMENLEESIKDKEQVKQSSNYINEDPDVQET 
YNNAVDHVTEI LNQTVN PTLS I EDI EHAI NEVNQAKKQLRGKQKL YQTI DLADKELSKLD 
DLTSQQSSSISNQIYTAKTRTEVAQAIEKAKSLNHAMKALNKIYKNADKVLDSSRFINED 

55 QPEKEAYQQAINHVDSIIHRQTNPEMDPTVINSITHELETAQNNLHGDQKLAHAKQDAAN 
VINGLIHLNVAQREVMINTNTNATTREKVAKNLDNAQALDKAMETLQQVVAHKNNILNDS 
KYLNEDSKYQQQYDRVIADAEQLLNQTTNPTLEPYKVDIVKDNVLANEKILFGAEKLSYD 
KSNANDEIKHMNYLNNAQKQSIKDMISHAALRTEVKQLLQQAKTLDEAMKSLEDKTQVVI 
TDTTLPNYTEASEDKKEKVDQTVSHAQAIIDKINGSNVSLDQVRQALEQLTQASENLDGD 
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QRVEEAKVHANQTIDQLTHLNSLQQQTAKESVKNATKLEEIATASNNALALNKVMGKLEQ 
FINHADSIENSDNYRQADDDKIIAYDDALEHGQDIOKSNATQNEAKQALQQLINAETSLN 
GFERLNHARPRALEYIKSLEKINNAQKSALEDKVTQSHDLLELEHLVNEGTNLNDIMGEL 
ANAIVNNYAPTKASINYINADNLRKDNFTQAINNARDALNKTQGQNLDFNAIDTFKDDIF 
5 KTKDALNGIERLTAAKSKAEKLIDSLKFINKAQFTHANDEIMNTNSIAQLSRIVNQAFDL 
NDAMKSLRDELNNQAFPVQASSNYINSDEDLKQQFDHALSNARKVLAKENGKNLDEIQIE 
GLKQVIEDTKDALNGIQRLSKAKAKAIQYVQSLSYINDAQRHIAESNIHNSDDLSSLANT 
LSKASDLDNAMKDLRDTLESNSTSVPNSVNYINADKNLQIEFDEALQQASATSSKTSENP 
ATIEEVLGLSQAIYDTKNALNGEQRLATEKSKDLKLIKGLKDLNKAQLEDVTNKVNSANT 

10 LTELSQLTQSTLKLNDKMKLLRDKLKTLVNPVKASLNYRNADYNLKRQFNKALKEAKGVL 
NKNSGTNVNINDIQHLLTQIDNAKDQLNGERRLKEHQQKSEVFIIKELDILNNAQKAAII 
NQIRASKDIKIINQIVDNAIELNDAMQGLKEHVAQLTATTKDNIEYLNADEDLKIQYDYA 
INLANNVLDKENGTNKDANI I IGMIQNMDDARALLNGIERLKDAQTKAHNDIKDTLKRQL 
DEIEhANATSNSECAQAKQMVNEEARKAFSNINHATSNDLVNQAKDEGQSAIEHIHADELP 

15 KAKLDANQMIDQKVEDINHLISQNPNLSNEEKNKLISQINKLVNGIKNEIQQAINKQQIE 
NATTKLDEVIETTKKLIIAKAEAKQVIKELSQKKRDAINNNTDLTPSQKAHALADIDKTE 
KDALQHIENSNSIDDINNNKEHAFNTLAHIIIWDTDQQPLVFELPELSLQNALVTSEWV 
HRDETISLESIIGAMTLTDELKVNIVSLPNTDKVADHLTAKVKVILADGSFVTVNVPVKV 
VEKELQIAKKDAIKTIDVLVKQKIKDIDSNNELTSTQREDAKAEIERLKKQAIDKVTHSK 

20 SIKDIETVKRTDFEEIDQFDPKRFTLNKAKKDIITDVNTQIQNGFKEIETIKGLTSNEKT 
QFDKQLTALQKEFLEKVEHAHNLVELNQLQQEFNNRYEHILNQAHLLGEKHIAEHKLGYV 
VVNKTQQI LNNQSAS Y FI KQWALDRI KQIQLETMNS IRGAHTVX 

Sequence 2965 

25 Contig_0541_pos_1894_2919 

is similar to (with p-value 2.0e-28) 

>gp:gp! AB001577 I AB001577_1 Pseudomonas sp. DNA for low spec 
ificity L-threonine aldolase, complete cds. NID: g2865133. 
gtgatttcatttgaaaatgattatttagaaggtgcacatgaaaaagttttaaatcgatta 

30 gtagagacaaatcgaatacaagctgctggatatggcttcgatgacttttcggcacaagct 
gcagataaaattagacaacgtattgactgtccagatgctaccattcgttttttagtaggt 
ggtacgcaaaccaatcaagtagttattaactcaatgcttgatagttatgaaggtgttata 
tccgctgatacaggacatgtggcagtccatgaaggtgqtgcgatagaattcagtgqacat 
aaagc.tctaaccataccctcccaagaaggtaagattactgctcaagacgtt gaga/it tat 

35 atagaaacttttgaaagtgattttaaaaaagaacacatggtgtatccagggatggtttat 
atttcacatccaaccgaatatggaactttatacacgaaagaagaattacaatctttatct 
agagtttgccgtagacatcagattccactatttatggatggtgcacgtttaggctatggc 
cttatgagcaatcaaactaatgtaactatcgaagatgttgcaaaatactgtgatgtgttt 
tacataggaggtactaagattggagcactttgtggtgaagcaattgtcttcactaaacaa 

40 aatgaacctaaaaacttcactacaattataaaacatcatggtgctttattagcaaaaggc 
cgtctaactggtgttcaatttttagaattattcactgatgatttatattttgatataagt 
cgacatgctattaaaatggctgaaaaggtaaaaaaaggatttatagataaaggatatcaa 
gtctattttgattcaccaaccaatcaacaattttttattttaagcaacgataaaattgaa 
gaactaaaacaaaaggtaaaattcgcagtttgggagaaatacgataatcaacatcgtgta 

45 gttcgcttcgcaacaagttgggccacaactgaagaaaatgttaatcaactacttgaacta 
atataa 

Sequence 2966 

VISFENDYLEGAHEKVLNRLVETNRIQAAGYGFDDFSAQAADKIRQRIDCPDATIRFLVG 
50 GTQTNQVVINSMLDSYEGVISADTGHVAVHEGGAIEFSGHKVLTIPSQEGKITAQDVENY 
IETFESDFKKEHMVYPGMVYISHPTEYGTLYTKEELQSLSRVCRRHQIPLFMDGARLGYG 
LMSNQTNVTIEDVAKYCDVFYIGGTKIGALCGEAIVFTKQNEPKNFTTIIKHHGALLAKG 
RLTGVQFT.ELFTDDLYFDISRHAIKMAEKVKKGFIDKGYQVYFDSPTNQQFFILSr.OKIE 
ELKQKVKiTAVWEKYDNQHRVVRFATSWATTEENVNQLLELI* 

55 

Sequence 2967 
Cont ig_054 l_pos_7 4 22_6532 
is similar to (with p-value 3.0e-74) 
>sp:sp|P54204!FUR_STAEP FERRIC UPTAKE REGULATION PROTEIN. > 
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gp :gp 1X97011 1 SEFURS0D_1 S . epidermidis genes fur and sod (par 
tial). NID: gl263907. 

atgttaataggatcacatgtttcaatgagtggcaaaaaaatgctgcaagggtcagcagaa 
gaagcacataaatatggtgaatctacatttatgatttatacaggtgcgcctcaaaataca 
5 agacgtaaaaatattgaagatttaaatatcgaaaaaggccagcaggcaatgaaaar.ritat 
ggcttat-caaatatcgttgtacatgcaccatatatcattaacattgcaaatacaaccaaa 
cctgaagtatttaatttaggagtcgactttctacaaaaagaaatcgaaagaactcaagcg 
ctcggagcgaaagatattgtactgcatcctggagcgcatgtcggagcaggtgtagataaa 
. ggaattcaaaaaattattgaaggacttaatgaagtactcacacatgataatgatgtaaga 

10 atagcacttgaaactatggcgggtaaaggaacagaagtagggagatcttttgaagaaatt 
gctcaaataattgatggtgttacacataatgatcgcttatcagtatgttttgatacgtgc 
cacactcatgatgccggttataacgtcaaagaagatttcgatggtgtactagaaaaattc 
gacagcataattggagtagatcgaattaaagtagtacatgttaatgacagtaaaaaccta 
agaggtgcacagaaggatcgtcacgaaaatatcggctttggtcatattggctttgatgca 

15 cttaattacgtagtacatcatgatacttttaaaaatattcccaaaatattagaaactcca 
tatgttggtgaagataaaaaaaataaaaaaccaccgtataaattagaaatagacatgtta 
aaatcacaaaaatttgatccagaactcaaaaacaaaattttaactcaataa 

Sequence 2968 

20 MLIGSHVSMSGKKMLQGSAEEAHKYGESTFMIYTGAPQNTRRKNIEDLNIEKGQQAMKTY 
GLSNIVVHAPYIINIANTTKPEVFNLGVDFLQKEIERTQALGAKDIVLHPGAHVGAGVDK 
GIQKIIEGLNEVLTHDNDVRIALETMAGKGTEVGRSFEEIAQIIDGVTHNDRLSVCFDTC 
HTHDAGYNVKEDFDGVLEKFDSIIGVDRIKVVHVNDSKNLRGAQKDRHENIGFGHIGFDA 
LNYVVHHDTFKNIPKILETPYVGEDKKNKKPPYKLEIDMLKSQKFDPELKNKILTQ* 

25 

Sequence 2969 
Contig_0541_pos_4 647_4228 

>sp:sp|P54476|END4_BACSU PROBABLE ENDONUCLEASE IV (EC 3.1.2 
1.2) (ENDODEOXYRIBONUCLEASE IV). >gp: gp | D84 432 I BACJH642_14 4 
30 Bacillus subtilis DNA, 283 Kb region containing skin element 
. NID: g2627063. >gp : gp I Z99116 | BSUB0013_223 Bacillus subtili 
s complete genome (section 13 of 21) : from 2395261 to 261373 
0. NID: g2634723. 

atgaatacaaatgatgcaattaaagttttaaaggaaaacggacttaaatatactgataaa 
35 cgtaaagatatgctagatatctttgttaaagaggataaatatttaaatgctaaacatatt 
caacaacaaatggataaagactatcctggaatatcatttgatactgtatacagaaatctt 
catttatttaaagatttaggcattatagagagtaccgaattagatggagaaatgaaattc 
agaatcgcatgcacaaatcatcaccatcatcattttatttgcgaaaattgcggagaaact 
aaagtgattgatttttgtccaatagaaaagattaaaagtcaattacccaatgtaaatatt 
40 catactcataaattagaagtgtatggtatttgtgaagaatgtcaacgtaaagcaaactaa 

Sequence 2970 

MNTNDAIKVLKENGLKYTDKRKDMLDIFVKEDKYLNAKHIQQQMDKDYPGISFDTVYRNL 
HLFKDLGIIESTELDGEMKFRIACTNHHHHHFICENCGETKVIDFCPIEKIKSQLPWVNI 
45 HTHKI.EVYGICEECQRKAN* 

Sequence 2971 

Cont i g_0 5 4 2_pos_5 7 4 7_5 2 7 4 

is similar to (with p-value 2.0e-17) 

50 >sp:sp|P41893|PPAL_SCHPO LOW MOLECULAR WEIGHT PHOSPHOTYROSI 
NE PROTEIN PHOSPHATASE (EC 3.1.3.48) (LOW MOLECULAR WEIGHT C 
YTOSOLIC ACID PHOSPHATASE) (EC 3.1.3.2) (PTPASE) (SMALL TYRO 
SINE PHOSPHATASE). >pir : pir | A554 46 1 A5544 6 protein-tyrosine-p 
hosphatase (EC 3.1.3.48), low molecular weight - fission yea 

55 St (Schizosaccharomyces pombe) >gp : gp | L33929 I YSPLMPTP_1 Schi 
zosaccharomyces pombe low Mr protein tyrosine phosphatase mR 
NA, complete cds. NID: g602991. 

gtgatactaatgatacatgtagcatttgtatgtctcggtaatatatgtcgttctccaatg 
gctgaggctatcatgagacaaagactacaagaaagaggtatttcagatataaaagttcat 
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tctagaggaacaggacgttggaatttaggcgaacctccacataacggaacacaaaaaatt 
ctacagaagtaccatattccttatgatggtatggtgagtgaacttttcgaacctgatgat 
gattttgactatattattgctatggaccaaagtaacgtagacaatatcaaacaaatcaat 
ccaaatttacaaggacaattgttcaaattgctagaatttagtaacatggaagagagtgat 
5 gtaccagatccatactacacaaataattttgaaggtgttttcgagatggtgcaatcatct 
tgtgataatttaatagactacatcgtaaaagatgcaaatttgaaagagaggtaa 

Sequence 2972 

VILMIHVAFVCLGNICRSPMAEAIMRQRLQERGISDIKVHSRGTGRWNLGEPPHNGTQKI 
10 LQKYHIPYDGMVSELFEPDDDFDYIIAMDQSNVDNIKQINPNLQGQLFKLLEFSNMEESD 
VPDPYYTNNFEGVFEMVQSSCDNLIDYIVKDANLKER* 

Sequence 2973 

Con t i g_0 5 4 6_pos_ 1 8 7 8_1 6 0 

15 >sp:sp| P174 44 |BETA_ECOLI CHOLINE DEHYDROGENASE (EC 1.1.99.1 
) (CHD) . >gp:gp 1X52905 1 ECBET S Escherichia coli betT, ber.I, 
betB and betA genes. NID: g48714. >gp : gp 1 M77738 I ECOBETA i E. 
coli choline dehydrogenase (betA) gene, complete cds . NID: g 
145401. >gp:gp| AE0O0138 |AE000138_2 Escherichia coli K-12 MGl 

20 655 section 28 of 400 of the complete genome. NID; gl786501. 

atgagaagaaaacgcgattcatacgattatgtcatcattggtggcggtagtgcaggttca 
gttcttggtgcacgcctttcagaggataaagataaaaatgttttggtattagaagctgga 
cgtagtgactat ttctgggatttat ttattcaaatgccagcagcat tgatgttcccatca 

25 ggtaatcgtttttatgactgggaatatcaaactgacgaagaaccacatatgggacgtaga 
gtagatcatgcgagaggtaaagtattaggtggctcaagttctattaacggtatgatttat 
caacgaggtaacccaatggactatgaaggatgggcagaacctgaaggaatggacacatgg 
gactttgcacattgtctaccatacttcaaaaagttagaaacaacatatggtgcagcgcca 
tacgataaagttagaggccatgatggtccaatcaaattaaaacgtggaccagctactaat 

30 ccat tat ttaaa teat tctttaatgcaggtgttgaagcgggctatcataaaactgcagac 
gttaatggatacagacaagaaggttttggaccatttgatagccaagtacatcatggacgt 
cgtatgtctgct tcaagagcgtatctacgcccagcattaagacgtagaaacttagatgtt 
gaaacacgtgcattcgttacaaaattaatttttgatgaaaataatagtaaaaaagtaaca 
ggcgtgactttcaagaaaaatggtaaagaacatactgttcatgcaaacgaagttatttta 

35 tctggcggtgctttcaatacaccacaactattacaattatcaggtattggtgactcagaa 
ttcttaaaatcaaaaggtatagagccacgtatgcatttaccaggtgttggtgagaacttc 
gaagatcdcttagaagtatatattcaacataaatgtaaacaaccggtttcactaciacct 
agccttgatgtcaaacgtatgccgttcatcggtttacaatggatttttgcacgtaaaggt 
gcagcggcgtctaaccactttgaaggtggtggctttgtaagatcaaatgatgatgttgat 

40 tatccaaacctcatgttccatttcttaccaattgctgtaagatatgatggtcaaaaagca 
ccagtagcacatggttaccaagtacatgttggaccaatgtactccaactcaagaggtagt 
ttgaaaatcaaatctaaggatccatttgaaaaaccaagtatcgtgtttaattacttatct 
acgaaagaagacgaaagagaatgggttgaagcaattagagtagcaagaaatatcctaaaa 
caaaaagctatggacccatttaatggtggcgaaatttcaccaggaccacaagttcaaacg 

45 gatgaagaaattctagattgggtacgtaaagatggagaaactgcattacatccatcttgt 
agcgcgaaaatgggacctgcatctgacccaatggcagtagtcgatccattaactatgaaa 
gtacatggtatggaaaatttacgtgtcgttgatgcttcagcaatgcctagaacaacaaat 
ggtaatattcatgcacctgtattgatgttagctgagaaagcagcggacattattcgtggt 
agaaaaccgcttgaacctcaatatgttgactattataaacatggtattgatgatgaaaaa 

50 gcaggtgcaatggaagatgatccattctaccaatattaa 

Sequence 2974 

MRRKRDSYDYVIIGGGSAGSVLGARLSEDKDKNVLVLEAGRSDYFWDLFIQMPAALMFPS 
GNRFYDWEYQTDEEPHMGRRVDHARGKVLGGSSSINGMIYQRGNPMDYEGWAEPEGMDTW 
55 DFAHCLPYFKKLETTYGAAPYDKVRGHDGPIKLKRGPATNPLFKSFFNAGVEAGYHKTAD 
VN'fY:iQEr,FGPFDSQVHHGRRMSASRAYLRPALRRRNLDVETRAFVTKLIFDENN?KKVT 
GVTFKKt;GKEHTVHANEVILSGGAFNTPQLLQLSGIGDSEFLKSKGIEPRMHLPGVGENF 
EDHLEVYIQHKCKQPVSLQPSLDVKRMPFIGLQWIFARKGAAASNHFEGGGFVRSNDDVD 
YPNLMFHFLPIAVRYDGQKAPVAHGYQVHVGPMYSNSRGSLKIKSKDPFEKPSIVFNYLS 
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TKEDEREWVEAIRVARNILKQKAMDPFNGGEISPGPQVQTDEEILDWVRKDGETALHPSC 
SAKMGPASDPMAVVDPLTMKVHGMENLRWDASAMPRTTNGNIHAPVLMLAEKAADIIRG 
RKPLEPQYVDYYKHGIDDEKAGAMEDDPFYQY* 

5 Sequence 2975 

Contig_054 7_pos_6940_6371 

is similar to (with p-value 2.0e-34) 

>pir:pir IS47148IS4714 8 hypothetical protein 1 - Staphylococ 
cus carnosus >gp: gp I X7 9725 | SCSECA_1 S.carnosus (TM300) oecA 

10 gene. NID: g499333. 

atgattagatttgaaattcatggagataacctcactatcacagatgcaattcgcaactat 
attgaggagaaagtaggtaaattagaaagatactttaacaatgtgccaaatgctgtagca 
catgttagagttaaaacttattctaattctacaactaaaattgaagttacaattccttta 
aaagatgtcactcttagagctgaagaaagacatgatgatttatatgcaggcattgattta 

15 attactaacaaattggaaagacaagtacgtaaatacaaaactcgtgtaaatcgtaaacat 
agagatcgtggagatcaagatatctttgttgctgaagtacaagagtctacaacaaacaat 
catgcagatgatatagaaagtgaaaatgatattgaaattattcgttctaaacaattcagc 
ttaaaaccaatggattctgaagaagcagtattgcaaatgaatttattgggacatgatttc 
tttatttttaccgatagagaaactgatggcacaagcattgtttatagacgaaaagatggc 

20 aaatacggtctgattgaaacaactgaataa 

Sequence 2976 

MIRFEIHGDNLTITDAIRNYIEEKVGKLERYFNNVPNAVAHVRVKTYSNSTTKIEVTIPL 
KDVTLRAEERHDDLYAGIDLITNKLERQVRKYKTRVNRKHRDRGDQDIFVAEVQESTTNN 
25 HADDIESENDIEIIRSKQFSLKPMDSEEAVLQMNLLGHDFFIFTDRETDGTSIVYRRKDG 
KYGLIETTE* 

SeqjL-nro 2977 
Cont ig_0550_pos__2 7 97_2 3 69 
30 is similar to (with p-value 5.0e-26) 

>gp :gp ( AF036166 ! AF036166_1 Xanthomonas campestris organic h 
ydroperoxide resistance protein (ohr) gene, complete cds . NI 
D: g3098341. 

atggcaaattctatttactcaactacaatgatttcaaatggtggacgtgatggtcgcgtt 
35 tttagtccagataatacatttgttcaaaaccttgcaacacctaaggaaatgggtggtcaa 
ggsggcaacgatactaatcctgaacaattatttgctgctggatatagtgcatgctttaat 
agcgcgctatcattaatcttatctcaaaataaaataagtgatgccaacccagaagttgaa 
atcactattgaattacttaaagatgatactgacaatggttttaaacttggcgcagatatt 
aaagtcacacttgaaaatatgtcccaacaagatgctgagaaatttgtggagcaagcacat 
40 caattctgtccatactcaaaagcgacacgtggtaacattgacgttcagttagatgttaca 
gcgcaataa 

Sequence 2978 

MANSIYSTTMISNGGRDGRVFSPDNTFVQNLATPKEMGGQGGNDTNPEQLFAAGYSACFN 
45 SALSLILSQNKISDANPEVEITIELLKDDTDNGFKLGADIKVTLENMSQQDAEKFVEQAH 
QFCPYSKATRGNIDVQLDVTAQ* 

Sequence 297 9 
Cont ig_0553_pos_7 97 5_7 0 67 
50 is similar to (with p-value l.Oe-37) 

>sp:sp|Q27546(IUNH_CRIFA INOSINE-URIDINE PREFERRING NUCLEOS 
IDE HYDROLASE (EC 3.2.2.1) ( lU-NUCLEOSIDE HYDROLASE) (PURINE 

NUCLEOSIDASE) . 

atgtctattcccattataattgatactgatcctggtatagatgatgctacagcaattagt 
55 atcgcactttcacaccctcaatttgacgttaaaatgatatcaactgtgaatggtaatgta 
ggtattgagaaaacgacagcaaatgcattaaagctaaaaaggttttttaatagttctgtt 
cctgtacatagaggggcatcccaaccattgattaatgacatctttgaagctacatcaatt 
catggtgagtctggtatggatggttacgagtttccacaaataaatcaagatgatttaaca 
tcaattcatgcagttgaagcaatgagaaatctattagtaaatactcaagaacctttaacc 
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ttgattgccataggtccactaacaaatatcgctattcttttaactagttatcccgaagtt 
caaccatttattaaggaaattgttttaatgggtggtagtaccggtagaggtaatgtaacg 
cctttagctgaatttaatatatattgtgatccagaagctgctcaaattgtatttaactct 
ggattacctttgactatgattggtcttgatttggctcgtgaagcattgtttactcaccat 
5 tttgtaaaagacttcaaagatacaaatgcaacttcaaacatgttatataatttatttcag 
cactataagagtgaagattttgaaataggatttaaattatacgatgtattcactatatta 
tatttgttggatccagaagcctttaatgtcaaggaggcgtatactcaaatagaattaaat 
ggcaactttacaaggggagccacagtggtagactttaatatggagcatcccaattgtaca 
gttgttttaagtcctgttgaaagacagtatgaggatttattcttaaacgccctttcttat 
10 tgtaaataa 

Sequence 2980 

MSIPIIIDTDPGIDDATAISIALSHPQFDVECMISTVNGNVGIEKTTANALKLKRFFNSSV 
PVHRGASQPLINDIFEATSIHGESGMDGYEFPQINQDDLTSIHAVEAMRNLLVNTQEPLT 
15 LIAIGPLTNIAILLTSYPEVQPFIKEIVLMGGSTGRGNVTPLAEFNIYCDPEAAQIVFNS 
GLPLTMIGLDLAREALFTHHFVKDFKDTNATSNMLYNLFQHYKSEDFEIGFKLYDVFTIL 
YLLDPEAFNVKEAYTQIELNGNFTRGATVVDFNMEHPNCTVVLSPVERQYEDLFLNALSY 
CK* 

20 Sequr=nc?j 2981 

Contig_0554_pos_5578_0 

is similar to (with p-value 9.0e-26) 

>gp:gp| D87664 I D87664_l Thermus aquaticus DNA for DNA polyme 
rase family X, aminopeptidase T, QAH/OAS sulfhydrylase, comp 

25 lete cds . NID: gl526546. 

atgacaaaaaaagatgtaattcaattattagaaaaaatagctatatatatggagctaaaa 
ggagaaaatacatttaaagtttcagcgtatagaaaagccgcacaaagtctagaggttgat 
gagcgtacattagaagagattgatgatgtaacagaacttaaaggcattggaaaaggcgta 
ggagaagttattaatgaatttaaaacacaaggtcaatcatcgacccttcaagcacttcaa 

30 gatgaagtacctgaagggttagtgccacttttgaaaatacaaggattaggcagtaaaaaa 
atagcgaaactatatcatgaacttcaaattacagataaagaaatacttcaaaaagcctgt 
gaagaaggtaaggtcagtgctttaaaaggttttgcaaaaaagacagagcaaaacatttta 
gaagcagtgaagtcgatgggtgctaaaaaagatcgttatcctatagagctaatgagagga 
ctcaaccaagaaattgtaaaatttattgaacagttagaaggagttgaacaatattcaact 

35 gctggtagttttcgaagatataaggaaatgagtaaagatttagatttcataattagtaca 
tcagagcctaaaaaagttcaacaacaattacttcgtattccgaataaagtcaaagatgtt 
gctgttggggatactaaaatttctctggaattagcttatgatgatgagacgattggcgtt 
gattttagattgatagaacctt 

40 Sequence 2982 

MTKKDVIQLLEKIAIYMELKGENTFKVSAYRKAAQSLEVDERTLEEIDDVTELKGIGKGV 
GEVINEFKTQGQSSTLQALQDEVPEGLVPLLKIQGLGSKKTAKLYHELQITDKEILQKAC 
EEGKVSALKGFAKKTEQNILEAVKSMGAKKDRYPIELMRGLNQEIVKFIEQLEGVEQYST 
AGSFRRYKEMSKDLDFIISTSEPKKVQQQLLRIPNKVKDVAVGDTKISLELAYDDETIGV 

45 DFRLIEPX 

Sequence 2983 
Contig_0555_pos_3938_2001 
>gp:gp|D85230|PEEGLTD_4 Plectonema boryanum URF141, ORF243, 
50 NADH-dependent glutamate synthase large subunit (gltB) and 
small subunit (gltD) and URF289 genes, partial and complete 
cds. NID: gl339947, 

atgagttacggctctatctcagcagaggcacatgagacgttggctcaagctatgaatcaa 
attggaggtaaaagtaatagtggagaaggtggagaagattcttcacgttacgaaattcaa 
55 aaggatggaagtaataagataagtgcgattaagcaagttgcatcaggtcgttttggggtg 
acgagtgattacttgcaacatgcaaaagaaattcaaattaaagtcgcacaaggcgctaaa 
ccaggggaaggtggacaactaccaggttcaaaagtatatccatggattgctgagactaga 
ggttcgacaccaggtataggattaatttcaccaccaccacaccatgatatttattcaatt 
gagg5ictiagcacagctcattcatgatttaaaaaatgcaaatagaagagctgatatcgca 
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gttaagcttgtatcaaaaactggcgttggaactatagcttcaggggtagctaaagctttc 
gccgataaaattgttataagtggttatgatggaggtacaggtgcatcgcctaaaacaagt 
attcaacatgcaggtttgccatgggagataggccttgccgaaacacatcaaacacttaaa 
ttaaatgatttgcgtagtcgcgtaaaattagaaacggatggtaagttactgacgggtaaa 
5 gatgtagcttatgcttgtgcgcttggtgcagaagaatttggtttcgcaacagcaccactt 
gttgttttggggtgtattatgatgagggtttgtcataacgatacgtgtccagtaggggtt 
gcaacacaaaacaaagatttaagagctttgtttagaggtaaggcacagcatgtagttaac 
tttatgtattttatagctgaagaattacgtgaaattttggcttcacttggtttagaaaca 
gtagaagagttagtaggaagaacagatcttcttcaacgttcgacgcaattgaaaccaaat 

10 agtaaagcagcttcgcttcaaatagaacgtttaatagaacaatttgacggggttaatacg 
aaagagatatcacaaaaccatcatcttgatgaaggattcgatttgaattatctgtaccca 
gacgcacgctatagtattgaaaacgggcactcttttaccggaaattatgttgttaataat 
gaacagcgagatgtaggtgtaattacaggtagtgcgatagctaaacaatatggagaagaa 
ggattacctgaagatacgatacttgcttacactgaaggtcatgcaggtcaaagcttagct 

15 gcatatgcaccacgcggattaacaatccatcataccggtgatgctaatgactacgtaggt 
aaaggattgtccggtggaactgtcatcgtaaatgctccaaatagtcaacgtgaaaatgaa 
attatagcaggaaatataaacttttacggggcttctagaggtaaagcgtttatcaatggt 
aaagctggtgagcgtttctgtatcagaaatagtggtgcagatgttgtagtagaaggtatt 
ggtgatcatggacttgaatatatgacagggggacatgtcattatcttaggagatgttgga 

20 aagaactttggccaaggcatgagcgggggcgtaagttatattttctcttctgacgtggag 
aaatttaaaaaggttaatgcgcttgaaactttagaattcagtagcatacgttttgatgag 
gaaaaatctcttatcaaagacatgcttgaagcacattttaagcatacacgtagtaacaaa 
gcacgccaattacttgaccaatttgacaatattgaaaagttagcaattaaagttattccg 
aaagattacaaattaatgatgcaaaaaattgatttgaaaaaacgtcaaatggaacgtgaa 

25 gatgaagcaacactggcagcgttttatgatgacagagaaacaattgaacaagagctacag 
ccagcagtcatttattaa 

Sequence 2984 

MSYGSISAEAHETLAQAMNQIGGKSNSGEGGEDSSRYEIQKDGSNKISAIKQVASGRFGV 
30 TSDYLQHAKEIQIKVAQGAKPGEGGQLPGSKVYPWIAETRGSTPGIGLISPPPHHDXYSI 
ED1.AQLIHDLKNANRRADIAVKLVSKTGVGTIASGVAKAFADKIVISGYDGGTGA.SPKTS 
IQHAGLFWEIGLAETHQTLKLNDLRSRVKLETDGKLLTGKDVAYACALGAEEFGFATAPL 
VVLGCIMMRVCHNDTCPVGVATQNKDLRALFRGKAQHVVNFMYFIAEELREILA5LGLET 
VEELVGRTDLLQRSTQLKPNSKAASLQIERLIEQFDGVNTKEISQNHHLDEGFDLNYLYP 
35 DARYSIENGHSFTGNYWNNEQRDVGVITGSAIAKQYGEEGLPEDTILAYTEGHAGQSLA 
AYAPRGLTIHHTGDANDYVGKGLSGGTVIVNAPNSQRENEIIAGNINFYGASRGKAFING 
KAGERFCIRNSGADVVVEGIGDHGLEYMTGGHVriLGDVGKNFGQGMSGGVSYIFSSDVE 
KFKKVNALETLEFSSIRFDEEKSLIKDMLEAHFKHTRSNKARQLLDQFDNIEKLAIKVIP 
KDYKLMMQKIDLKKRQMEREDEATLAAFYDDRETIEQELQPAVIY* 

40 

Sequence 2985 
Contig_0555_pos_1962_520 

>sp:sp| P39812|GLTB_BACSU GLUTAMATE SYNTHASE [NADPH] LARGE C 
HAIN (EC 1.4,1.13) (NADPH-GOGAT) . >gp: gp | Z99113 I BSUBa010_138 
45 Bacillus subtilis complete genome (section 10 of 21) : from 
1781201 to 2014980. NID: g2634090. >gp: gp | Z99114 | BSUB0011_9 
Bacillus subtilis complete genome (section 11 of 21) : from 2 
000171 to 2207900. NID: g2634230. 

atgaaatatgataaacagtcgctatcagaattgtctttggtagaccgtctttcgaatcat 
50 gaagcgtttcaacaacgcttcactaaagaagatgcttcgattcagggtgcgcgctatatg 
gat cocao sacacctttttgtcaaactgggcaatcttatggaagagaaacaatag'^atgc 
cctattggtaattatatacctgagtggaacgacttagtctatcatcaagattttaaagct 
gcttacgaaagattgagagagacgaataattttcctgaatttacaggaagagtttgtcct 
gcaccatgtgagcaatcatgtgttatgaaaattaatagagaatccgtggcgattaaaggt 
55 attgaacgtacaattattgatgaagcatatgagaatgagtgggttcatcccgcatatcct 
gaagatcataaagaccaacgagttgctatcgtaggtagtggtccagcgggacttacagca 
gctgaagaattaaactttaaaggctataaagttactgtttatgaaaaggcgcatgaacca 
ggcggcttgctaatgtatggtataccaaatatgaaactagataaagacgtaatacgtcga 
cgtgtatcacttatgaaagatgctggggttttatttaaaacaggcgttgaaattggcgtc 
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gatgtgagccgtgaaacacttgaagaaaattatgatgctattattttatgcacaggtgct 
caaaatgcgagagatttaccattggaaggacgaatgggctctggtattcattttgcaatg 
gactatci.tactgaacaaacacagtatctaaatggtgagattgaaagtttgagca- tact 
gctaaagataagaatgtaattattataggtgctggtgatactggtgcagactgtgtagcg 
5 acagcattacgtgaaaactgtaaatctattgttcaatttaataaatatacgaaacagcct 
gaagagattacttttgaaagtaatacttcctggccattagcaatgcctgttttcaaaatg 
gattatgcgcataaggaatatgaagctaaatttggtcaagaaccaagagcctatggtgta 
caaacaatgcgctatgatgttgacgagttaggaaatgttaaaggcttatatacacaaata 
ttaaaagaaacgcctgatggcatggtgatggaagatggaccagaacgattttggccggct 
10 gatttagtcttattatctatagggtttgttggtactgaaaccactgttccgcatgcgttt 
gatatacacaccgagcgtaataaaattgtagctaatgatacaaattatcaaactaatcac 
gctaaaatatttgctgcaggagatgcaagacgaggtcagagtttggttgtttgggcaata 
aaagaaggtcgtgaagtagcacattctgttgatcaatacttaagtaaagaagttctagtg 
taa 

15 

Sequence 2986 

MKYDKQSLSELSLVDRLSNHEAFQQRFTKEDASIQGARCMDCGTPFCQTGQSYGRETIGC 
PIGNYIPEWNDLVYHQDFKAAYERLRETNNFPEFTGRVCPAPCEQSCVMKINRESVAIKG 
lERTIIDEAYENEWVHPAYPEDHKDQRVAIVGSGPAGLTAAEELNFKGYKVTVYEECAHEP 
20 GGLLMYGIPNMKLDKDVIRRRVSLMKDAGVLFKTGVEIGVDVSRETLEENYDAIILCTGA 
QNARDtPLEGRMGSGIHFAMDYLTEQTQYLNGEIESLSITAKDKNVIIIGAGDTGADCVA 
TALRENCKSIVQFNKYTKQPEEITFESNTSWPLAMPVFKMDYAHKEYEAKFGQEPPJVYGV 
QTMRYDVDELGNVKGLYTQILKETPDGMVMEDGPERFWPADLVLLSIGFVGTETTVPHAF 
DIHTERNKIVANDTNYQTNHAKIFAAGDARRGQSLVVWAIKEGREVAHSVDQYLSKEVLV 

25 

Sequence 2987 

Contig_0557_pos_8699_9661 

is similar to (with p-value l.Oe-71) 

30 >sp;sp| P7 6113I YNCB_ECOLI PUTATIVE NADP- DEPENDENT OXIDOREDUC 
TASE IN TEHB-RHSE INTERGENIC REGION (EC 1.-.-.-). >gp:gp|D90 
784|D90784_8 E.coli genomic DNA, Kohara clone #273(32.5-32.8 
mia.). NID: gl742353. >gp: gp I D90785 | D90785_2 E.coli genomic 
DNA, Kohara clone #274 (32 . 7-33, 0 min. ) . NID: gl742363. 

35 atgccacaagacgatacatttaaatatgaagatatagatgttattgaaccttcagaaaat 
gaattgcaattgaaaacattatatatatcggttgatccatatatgagaggacgtatgaca 
aatgctgattcttatgtagatccgttcaaacaaggggaaccgttcaatggacatacggta 
tctaaagttttgaaatccaaggatagtaattttgatgaaggtgatatagtagtgggtatg 
ctaccttggagaaaaataaatacagtaaatagtgagtatgttaacaaagtacctacttct 

40 gacgtaccattacatctttatcttagtgtgttggggatgcctggtcagactgcttatcat 
ggattacttgatattggacaacctaaagaaggtgaaacagtagttatttcagcagcttca 
ggugcagttggttcagttgtgggccaaattgcgaagcttaaaggttgcagagtcg^itggt 
atagctggtggagataaaaaagtgaactatctaaaaaatgaacttcgttttgatgctggt 
atcgattacaaaaaagataatttccctgaagcgttaaaagaagcggtgcctaacggtata 

45 gatgtctacttcgaaaatgtaggtggatatattggcgatgaagtcttcaaacatctcaat 
acacatgcaagaattcctgtttgtggtgcgatttcatcctataatcatccagaaaaagat 
attggaccacgcattcagcaaacattgattaaaaatcaagcaatgatgagaggt ttcata 
gtagcagaattcgctgatggttttaaagaagcgagcaaacaattagctcaatgggttcaa 
gagaataaaattaaaacacaagtttcagtagaagatggttttgataaagtgccgcaagcc 

50 tttagaaatctgctaactggtgataattttggtaaacaagttattaaagtggcaagtgaa 
taa 

Sequence 2988 

MPQDDTFKYEDIDVIEPSENELQLKTLYISVDPYMRGRMTNADSYVDPFKQGEPFNGHTV 
55 SKVLKSKDSNFDEGDIVVGMLPWRKINTVNSEYVNKVPTSDVPLHLYLSVLGMPGQTAYH 
GLLDIGQPKEGETVVISAASGAVGSVVGQIAKLKGCRVVGIAGGDKKVNYLKNELRFDAG 
IDYKKDNFPEALKEAVPNGIDVYFENVGGYIGDEVFKHLNTHARIPVCGAISSYNHPEKD 
IGPRIQQTLIKNQAMMRGFIVAEFADGFKEASKQLAQWVQENKIKTQVSVEDGFDKVPQA 
FRNLLTG DN FGKQVI KV ASE * 
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Sequence 2989 
Contig_0557_pos_9913_0 
is similar to (with p-value 3.0e-17) 
5 >sp:sp|Q08257 1Q0R_HUMAN QUINONE OXIDOREDUCTASE (EC 1.6.5.5) 
(NADPHiQUINONE REDUCTASE) (ZETA- CRYSTALLIN) . >pir :pir I PN04 
48IPN0448 zeta-crystallin / quinone reductase (NADPH) (EC 1. 
6 _ _) _ human >gp: gp I L13278 I HUMQUINZ_1 Homo sapiens zeta-cr 
ystallin/quinone reductase mRNA, complete cds . NID: g292414. 

10 

atgccagttgacaaagcgccacgtgtacttggctttgatgctgttggtgtgattgaaaag 
ataggagatcaagtgtcaatgtttcaagaaggggacgtcgttttttattcaggttctcct 
aaccaaaatggttcgaatgaagaataccaattaatagaggaatatttagtagctaaagca 
cct HcaaatttgaaaagtgaacaagcagctagcctacctttaactgggGtaacagcttat 
15 gaaacgcttttcgatgtttttggaatttcaaaagaaccatctgaaaataaaggtaaatca 
ttgttaataattaatggagcaggtggtgtaggtagtattgcaacacagatagcgaaattt 
tatgggttgaaggttattacaactgcttcgagagaggatactataaagtggtctgttaat 
atgggtgctgatgttgtactgaatcataagaaagatttaagtc 

20 Sequence 2990 

MPVDKAPRVLGFDAVGVIEKIGDQVSMFQEGDWFYSGSPNQNGSNEEYQLIEEYLVAKA 
PTNLKSEQAASLPLTGLTAYETLFDVFGISKEPSENKGKSLLIINGAGGVGSIATQIAKF 
YGLKVITTASREDTIKWSVNMGADWLNHKKDLSX 

25 Sequence 2991 

Contig_0558__pos_11954_0 

is similar to (with p-value l.Oe-22) 

>pir :pir I A43577 I A43577 regulatory protein pfoR - Clostridiu 
m perfringens 

30 gtgttatggggagtgtttgatatggatttgcttataggtacattatttttaattttagtt 
ttagttgtttttactctatttacatataaggcacctagtgggatgaaagcgatgggtgct 
ttagcgaatgcggcaattgcttcgtttttagttgaagctttcaataaatatgtgggtgga 
caagtatttggtattaaattcttagaagaattaggagatgctgctggaggtttaggtggt 
gtggc.tgcggctggattaactgcattagcgattggagtatctccagtttatgcacV.agtt 

35 attggtgcagcttgtggaggtatggacttgttgccaggatttttcgctgggtatattgta 
ggctacatgatgaagtataccgagaaatatgtgccagatggtattgatttaattg 

Sequence 2992 

VLWGVFDMDLLIGTLFLILVLVVFTLFTYKAPSGMKAMGALANAAIASFLVEAFNKYVGG 
40 QVFGIKFLEELGDAAGGLGGVAAAGLTALAIGVSPVYALVIGAACGGMDLLPGFFAGYIV 
GYMMKYTEKYVPDGIDLIX 

Sequence 2993 
Con t i g_0 558 _po s_7 1 1 3_ 6 4 0 0 
45 is similar to (with p-value 2.0e-21) 

>pir:pir |A55345|A55345 diamine N-acetyltransf erase (EC 2.3. 
1.57} - Escherichia coli >gp : gp | D25276 I EC0SN1A_1 Escherichia 
coli gene for spermidine acetyltransf erase, complete cds. N 
ID: g517104. >gp:gp| AE000254 ( AE000254_5 Escherichia coli K-1 
50 2 MG1655 section 14 4 of 4 00 of the complete genome. NID: gl7 
87862. 

atgagtcacttcgaacattt tact ttagaacattaccactcccttttt cat aatgat egg 
ttoatgtcggtgatttttaatacagttgcagtggcacttttatctgcatcgattgcf.aca 
gttaizaggtacatttggtgcgatcgctttatattacttacgcaacaaacggtttaciggtt 
55 acgctattaacgatgaataatgtattaatggtatcttcagacgtagtcattggtgcttct 
tttttaattatgttcactgcaataggacatttcacaggtttaggtttaggattttcaaca 
gtgctagcatctcatatagcgttctgtattccaattgttgttatcatcgtcttaccccaa 
ttatatgaaatgaatgataacatgttaaatgcagcaagagatttaggtgctaatgaatca 
caattattaactagcattattatacctaatattatgccctcgattataggaggattcttt 
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atggcattaacatattctctagatgattttacggtaagcttctttgttactggaaacgga 
tttagtgtattgtctgtagaagtttatgctatggctcgaaaaggtataagtatggagatt 
aatgccatttctacaataatatttattgctattatgtttggagtatttggatattacttt 
attcaacatatcgttaatcgtcagaaaaagatgaagcgaggtgttaatgaatga 

5 

Sequence 2994 

MSHFEHFTLEHYHSLFHNDRLMSVIFNTVAVALLSASIATVIGTFGAIALYYLRNKRFKV 
TLLTMNNVLMVSSDVVIGASFLIMFTAIGHFTGLGLGFSTVLASHIAFCIPIVVIIVLPQ 
LYEMNDNMLNAARDLGANESQLLTSIIIPNIMPSIIGGFFMALTYSLDDFTVSFFVTGNG 
10 FSVLSVEVYAMARKGISMEINAISTIIFIAIMFGVFGYYFIQHIVNRQKKMKRGVNE* 

Sequence 2995 

Contig_0558_pos_l 97 3_1 638 

is similar to (with p-value l.Oe-26) 

15 >sp:sp| P23859|POTC_ECOLI SPERMIDINE/ PUTRESCINE TRANSPORT SY 
STEM PERMEASE PROTEIN POTC» >pir : pir I C4 084 0 | C4 0840 spermidin 
e/putrescine transmembrane protein C - Escherichia coli >gp: 
gp|D90747|D90747_3 Escherichia coli genomic DNA. (25.1 - 25.5 
min) . NID: gl651548. >gp:gp | AE000212 I AE000212_10 Escherichi 
20 a coli K-12 MG1655 section 102 of 400 of the complete genome 
. NID: gl787358. >gp : gp 1 M64 519 1 EC0P0TABCD_3 E.coli transport 
protein (potA, potB, potC and potD) genes, complete cds. NI 
D: gl47325. 

gtggtagagttattagaaattaactttatacatagaacttgtgaagtgttaattattatc 
25 gatcogcagtatgcaaataatgggtacgcgaaaaaagcctttaaaatggctattgactat 
gcttttTcagtattaaatatgaataaggtatacttatatgtggatattaagaatgagaaa 
gcagtacatatctatcaaagtaataatttcgaaatagaaggaacgttaaaggaacacttc 
tatacaaggggagaatatagagattgctatgtaatgggcttgttaaaaaggaattgggtt 
aataagaatgatgatgatttgtctcatataagatga 

30 

Sequence 2996 

VVELLEINFIHRTCEVLIIIDPQYANNGYAKKAFKMAIDYAFLVLNMNKVYLYVDIKNEK 
AVHIYQSNNFEIEGTLKEHFYTRGEYRDCYVMGLLKRNWVNECNDDDLSHIR* 

35 Sequence 2997 

Contig_0561_pos_2326_3213 

is similar to {with p-value 4.0e-30) 

>sp:sp|P2824 6|BCR_ECOLI BICYCLOMYCIN RESISTANCE PROTEIN (SU 
LFONAMIDE RESISTANCE PROTEIN) . 

40 atgatgatgactacaagttccaaacacttatctaaaatattaatcgttatacttggcgtt 
atgactgcatttggtcctttgactattgatatgtacggaccatctttacctaaagttcag 
catgcgtttggttcatcaatttcagaaatacaacttacattatcctttgctatgataggt 
ttagctattggtcaatttgtattcgggccactatcagatgtattaggtcgtaaaaaaatg 
gcactcattttattgattggatatttaatagcctcattattatcagtttttacagttcat 

45 ttaaaar:tatttttaattatccgtttaattcaaggtttagcaggaggtggtgcaatcgtc 
atagccaaagcttctattggagataactatgacggagacgaattagcaaaatttttaact 
tctcttatggtcataaacggtataatcaccatcattgctccactgttaggtggcctcgct 
ttatctattgcaagttggagaatgatttttatatttttaacaatcattaccttaatagtc 
atcttaggcattttattaaagatgccagttgggccccatcaagaacaatctcagttaaat 

50 tttaaagcaatatttaaagattttggtctgttattaacaaaacccaccttcgttattcca 
atgttattgcaaggattaacttatgtcatgttattcagttattcgtcagccgcacctttt 
atttcacaaaagatgtatcatatgacaccacttcaatacagtgcaatgtttgctattaat 
ggagtgggtttgatagtcgtcagtcagataaccgctattatagtagaaaaggtaagccga 
tatgcgatgctcatatatttaacaatcattcaaatgttaggtgtttaa 

55 

Sequence 2998 

MMMTTSSKHLSKILIVILGVMTAFGPLTIDMYGPSLPKVQHAFGSSISEIQLTLSFAMIG 
LAIGQFVFGPLSDVLGRKKMALILLIGYLIASLLSVFTVHLTIFLIIRLIQGLAGGGAIV 
lAKASIGDNYDGDELAKFLTSLMVINGIITIIAPLLGGLALSIASWRMIFIFLTIITLIV 
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ILGILLKMPVGPHQEQSQLNFKAIFKDFGLLLTKPTFVIPMLLQGLTYVMLFSYSSAAPF 
ISQKMYHMTPLQYSAMFAINGVGLIVVSQITAIIVEKVSRYAMLIYLTIIQMLGV* 

Sequence 2999 
5 Contig_0561_pos_4 708_4139 

is similar to (with p-vaiue 3.0e-49) 

>sp:sp|P28368|YVYD_BACSU HYPOTHETICAL 22.0 KD PROTEIN IN FL 
IT-SECA INTERGENIC REGION. >gp : gp | Z31376 | BSFLIDST_5 B. subtil 
is (HB2058) genes for FliD, FliS, FliT proteins. NID: g49937 

10 9. >gp:gp|Z99122|BSUB0019_28 Bacillus subtilis complete geno 
me (section 19 of 21): from 3597091 to 3809700. NID: g263602 
9. >gp:gplU56901 |BSU56901_22 Bacillus subtilis putative tran 
scriptional regulator (yvhJ), Ycr59c/YigZ homolog (yvhK), hi 
stidine kinase (degS) , transcriptional regulator of degradati 

15 on enzyme (degO) , (degV) , (comFA) , (comFB) , (comFC) , flagell 
ar protein (yviB) , negative regulator of flagellin (flgM), f 
lagal-.lc;j' protein (yviC) , f lagellar-hook associated prot«?in 1 
(f IgK) , f lagellar-hook associated protein 3 (flgL), (yviE) , 
transmembrane protein (yviF), (csrA) , flagellin (hag), flag 

20 ellar protein (yviH) , flagellar hook-associated protein 2 (f 
liD), flagellar protein (fliS), flagellar protein (fliT), si 
gma-54 modulator homolog (yvil), and (secA) genes, complete 
cds. NID: gl762326. 

atgattagatttgaaattcatggagataacctcactatcacagatgcaattcgcaactat 
25 attgaggagaaagtaggtaaattagaaagatactttaacaatgtgccaaatgctgtagca 
catgttagagttaaaacttattctaattctacaactaaaattgaagttacaattccttta 
aaagatgtcactcttagagctgaagaaagacatgatgatttatatgcaggcattgattta 
attactaacaaattggaaagacaagtacgtaaatacaaaactcgtgtaaatcgtaaacat 
agagatcgtggagatcaagatatctttgttgctgaagtacaagagtctacaacaaacaat 
30 catgcagatgatatagaaagtgaaaatgatattgaaattattcgttctaaacaattcagc 
ttaaaaccaatggattctgaagaagcagtattgcaaatgaatttattgggacatgatttc 
tttatttttaccgatagagaaactgatggcacaagcattgtttatagacgaaaagatggc 
aaatacggtctgattgaaacaactgaataa 

35 Sequence 3000 

MIRFEIHGDNLTITDAIRNYIEEKVGKLERYFNNVPNAVAHVRVKTYSNSTTKIEVTIPL 
KDVTrH7*.LERHDDLYAGIDLITNKLERQVRKYKTRVNRKHRDRGDQDIFVAEVQESTTNN 
HADDIESENDIEIIRSKQFSLKPMDSEEAVLQMNLLGHDFFIFTDRETDGTSIVYRRKDG 
KYGLIETTE* 

40 

Sequence 3001 

Cont i g_0 5 62_pos_l 288 5_1 3190 
is similar to (with p-value l.Oe-20) 
>sp:sp| P4 6378 1 FAS6_RHOFA HYPOTHETICAL 21.1 KD PROTEIN IN FA 
45 SCIATION LOCUS (0RF6) . >pir : pir | F55578 I F55578 hypothetical p 
rotein 2 (ipt 3' region) - Rhodococcus fascians plasmid pFiD 
188 >gp:gp| Z29635 |RFCCIPTFD_6 R. fascians (D188) genes for P4 
50 cytochrome, isopentenyltransf erase and ferredoxine. NID: 
g455000. 

50 gtgttgaaaaagatacatgggttattgttttgtccaactcttgaccatgatacaacttcc 
gttatttcttctaaagttccaggtccaccaggaagtgctatgcatacatcgcctttcttt 
aagataacttcttttctttcagacatactttccacgattattagttcgtctaatttatca 
tgtgctaactctctttgttttaaaaacgtaggcattacaccaattgtttttccgttgtta 
tgtattacagtatttgcaattgttcccattaatcctgcatttccgccaccaaaaactaaa 

55 gtgtga 

Sequence 3002 

VLKKIHGLLFCPTLDHDTTSVISSKVPGPPGSAMHTSPFFKITSFLSDILSTIISSSNLS 
CANSLCFKNVGITPIVFPLLCITVFAIVPINPAFPPPKTKV* 



746 



wo 01/34809 



PCTAJSOO/30782 



Sequence 3003 

Cont ig_0 5 62_pos_l 3 285_12725 
is similar to (with p-value 8.0e-24) 
5 >sp:spl P4 6378|FAS6_RHOFA HYPOTHETICAL 21.1 KD PROTEIN IN FA 
SCIATION LOCUS (0RF6) . >pir : pir | F55578 I F55578 hypothetical p 
rotein 2 (ipt 3* region) - Rhodococcus fascians plasmid pFiD 
188 >gp:gp|Z29635|RFCCIPTFD_6 R, fascians (D188) genes fcr P4 
50 cytochrome, isopentenyltransf erase and ferredoxine. niD: 
10 g455000. 

atgaatattatagtctattgtggagcaagcaagggaaacaaaaaagaatacgaaaatagt 
gcaattcaattaggtgaatggatagctaaaaataatcacactttagtttttggtggcgga 
aatgcaggattaatgggaacaattgcaaatactgtaatacataacaacggaaaaacaatt 
ggtgtaatgcctacgtttttaaaacaaagagagttagcacatgataaattagacgaacta 

15 ataatcgtggaaagtatgtctgaaagaaaagaagttatcttaaagaaaggcgatgtatgc 
atagcacttcctggtggacctggaactttagaagaaataacggaagttgtatcatggtca 
agagttggacaaaacaataacccatgtatctttttcaacacaaataattattattccctt 
atcgaacaattctacgatcaaatggtttcaaacgagtttttaactcaagaagatagagat 
aaaatattattctcaaactcattccaagaaattgaagaatttatagaaaactataaaaca 

20 ccaaaaataagaacttattaa 

Sequence 3004 

MNIIVYCGASKGNKKEYENSAIQLGEWIAKNNHTLVFGGGNAGLMGTIANTVIHNNGKTI 
GVMPTFLKQRELAHDKLDELIIVESMSERKEVILKKGDVCIALPGGPGTLEEITEVVSWS 
25 RVGQNNNPCIFFNTNNYYSLIEQFYDQMVSNEFLTQEDRDKILFSNSFQEIEEFIENYKT 
PKIRTY* 

Se<iJ<ijncn 3005 
Contig_0563_pos_54 97_5093 
30 is similar to (with p-value 4.0e-*71) 

>gp:gp| AF046871 I AF04687I_3 Anabaena PCC7120 heterocyst-inhi 
biting signaling peptide (patS) and holiday junction resolva 
se (ruvC) genes, complete cds; and unknown genes. NID: g2896 
023. 

35 atgttagctaataatggtttaatcgcgattaatctcgcttatcagaatttagaaagagca 
tttgttcaagatgtttctgatattgaatccaaacttacgttagcagcgacacctaagctc 
gcatcaaaatcagctattagagaaagtatacgcttagcaattgttcctacaattgattct 
gtaaaaacatatggtctagtttcaattccaggtatgatgacaggattgattatcggaggc 
gttgacccacttcaagcaattaaatttcaattgcttgtcgtgtttattcatacaacagcg 

40 acgattatgtctgcactcattgcaacgtatatgagttacggtcagttctttaatgctcgt 
catcaactcattgctagaacgcaacgcacaagacaaagtagttaa 

Sequence 3006 

MLANNGLIAINLAYQNLERAFVQDVSDIESKLTLAATPKLASKSAIRESIRLAIVPTIDS 
45 VKTYGLVSIPGMMTGLIIGGVDPLQAIKFQLLWFIHTTATIMSALIATYMSYGQFFNAR 

HQLIARTQRTRQSS* 

Seqaencfi 3007 
Contig_0563_pos_19941008 
50 is similar to (with p-value 2.0e-32) 

>sp:sp|P77307|YBBM_ECOLI HYPOTHETICAL 28.2 KD PROTEIN IN US 
HA-TESA INTERGENIC REGION. 

atgaatgtagtacttattggtggtggcactggactttctgtccttgctagaggccttaga 
gaatttccaatagacattactgccattgttactgtagcggacaatggtgggagcacgggg 
55 aaaattagagatgtcatggatattccagcgcctggtgatattcgtaatgtcattgctgct 
ttaagtgactcagaatcgatattaactcaattgttccagtaccgttttggtgaaaatcaa 
gtagatgggcattcattaggtaatttagttattgctggaatgactaacattactaatgat 
tttggacacgctattaaagagttaagcaaagttttaaatattaaaggccaagtcatccct 
tcaacaaacgcaagtgtgcaactcaacgcggtgatggaagacggtgaaattgtacatgga 
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gaaactaatatacctaaaacacataaaaaaatagatcgtgtgtttttagaaccaaytgat 
gttgaaccaatgaatgaagcgatagaagctttagaacaagcagatttaattgtcttagga 
ccaggttcattatatacaagtgttatatcaaatttatgtgtcaaaggtatttcagaagca 
ttattacgtacatctgctccaaaactttatgtatctaatgttatgacacaaccaggcgag 
5 actgataattatgatgtcaaagagcatattgatgcacttactcgacaagttggtgaacca 
tttattgattttgtcatatgtagctcagaatcctatagtaaagatgttttacaacgatat 
gaagaaaagaattcgaaaccagtagcagtacataaagaacaattaaaagatagtggaatt 
agagttttaacggcatctaatttagttgaaatatctaatgaacactatgtcagacataac 
acaaaagtattatcaaaaatgatttatgaacttgccttagaattaacaagtacaattcgc 
10 tttactcctagtgataaaaagaaataa 

Sequence 3008 

MNVVLIGGGTGLSVLARGLREFPIDITAIVTVADNGGSTGKIRDVMDIPAPGDIRNVIAA 
LSDSESILTQLFQYRFGENQVDGHSLGNLVIAGMTNITNDFGHAIKELSKVLNIKGQVIP 
15 STNASVQLNAVMEDGEIVHGETNIPKTHKKIDRVFLEPSDVEPMNEAIEALEQADLIVLG 
PGSLYTSVISNLCVKGISEALLRTSAPBCLYVSNVMTQPGETDNYDVKEHIDALTRQVGEP 
FIDFVICSSESYSKDVLQRYEEKNSKPVAVHKEQLKDSGIRVLTASNLVEISNEHYVRHN 
TKVLSKMIYELALELTSTIRFTPSDKKK* 

20 Sequonc't 3009 

Contig_0564_pos_5882_6601 

is similar to (with p-value 7.0e-36) 

>sp:sp!P4 9309|MOCR_RHIME PROBABLE RHIZOPINE CATABOLISM REGU 
LATORY PROTEIN MOCR. >pir : pir I S51574 ( S51574 mocR protein - R 

25 hizobium meliloti >gp: gp | X78503 | RMM0CCABR_6 R.raeiiloti mocC, 
ORF334, ORF293, mocA, mocB and mocR genes. NID: g468758. 
atgattatcgttgctacagctactggggatatgcctttcccaagcgtggctaacatctta 
caagaaagactaggaacaggtaaggttgctactatggatcaacttgctgcatgttctggc 
tttatgtattcaatgattactgctaaacaatatatacaatctggtgattacaaacatatt 

30 ttagttgtgggtgctgataaattatctaagattaccgatatgactgaccgttctactgct 
gtattatttggagacggtgctggagctgttgtcatgggagaagttgctgaaggtcgtggt 
atcattagctatgaaatgggttcagacggtagtggtggtaaatacttgtacttagataga 
gaaactggcaaactcaaaatgaatggtagagaagtatttaaatttgctgtgagaattatg 
ggtgatgcatctacgcgtgtagttgagaaagctggtttatcgtctgaagacatagactta 

35 tttgttccacatcaggctaatattagaattatggaatctgcgagagagagattaggaata 
gaaagagaaaaaatgagtgtctcagtaaataaatatggtaatacttcggctgcctcaata 
ccattaagtattaatcaagaattgcaaaatggaaaaatcaaagatgacgatactttagtc 
ttagttggcttcggtggaggtctaacttggggcgcaatcgttattaaatggggaaaatag 

40 Sequence 3010 

MIIVATATGDMPFPSVANILQERLGTGKVATMDQLAACSGFMYSMITAKQYIQSGDYKHI 
LWGADKLSKITDMTDRSTAVLFGDGAGAVVMGEVAEGRGIISYEMGSDGSGGKYLYLDR 
ETGKLKMNGREVFKFAVRIMGDASTRVVEKAGLSSEDIDLFVPHQANIRIMESARERLGI 
EREKMSVSVNKYGNTSAASIPLSINQELQNGKIKDDDTLVLVGFGGGLTWGAIVIKWGK* 

45 

Sequence 3011 

Contig_0564_pos_6613_0 

is similar to (with p-value 9.0e-47) 

50 >sp:sp| P307 90I YHI3_RH0CA HYPOTHETICAL 33.7 KD PROTEIN IN HI 
MA 5'REGION (0RF3) . >pir ; pir | C41608 | C41608 hypothetical prot 
ein 3 (himA 5* region) - Rhodobacter capsulatus >gp:gp|M8403 
0|RCAHIMA_3 Rhodobacter capsulatus integration host factor ( 
himA) gene, complete cds. NID: gl51940. 

55 atgaataaaaataatagagttgttataacgggtatcggagccttatctccaattggtaac 
gatgctaaaacaacatgggacaatgcactaaaaggtgttaacggtatagataaaatcaca 
agaatagatactgatgattataatgtacatcttgctggtgaattgaaagattttaatata 
gaagaccacattgatagaaaagaagctcgccgtatggatcggtttacacaatacgcggtg 
gtt gc*:gcaagagaagcggttaaagatgcacaattaaatattaatgaaaaaaatg(;::gac 



748 



wo 01/34809 



PCT/USOO/30782 



cgtattggtgtatggattggttctggtatcggtggtatggaaactttcgaagttgcacat 
acaacacttgtagaaagaggaccacgtcgagtaagtccatttttcgttccaatgttaatt 
cctgatatggctactggtcaagtttctattgatttaggtgccaaagggcccaatggttct 
acagtaacagcttgtgctacggggactaactcaataggtgaggcatttaaaattattcaa 
5 cgtggtgatgcagatgcaatggtgactggtggaacagaggcacctattacacatatggca 
atcgcagggtttagtgcaagtcgtgcattatctacaaacaatgaccctgaaacagcttgt 
cgaccattccaagaaggccgtgatggctttgttatgggtgaaggggcaggtattgttgta 
cttgaatcattagattcagctaaagagagaggcgctgaaatttacgctgaagttgtaggt 
tatggttcctctggcgatgcacatcatattacagcacctgcgcctgaaggtgaaggtggc 
10 tcacgagctatgcaagctgctttagatgatgctggaatcaaagctcaagatgtacagtat 
ttaaatqcacatggcacaagtacacctgttggagatttatatgaggttcaagcgattaaa 
aatacattcggtgatgctgcgaagtcattaaaagtaagttcaactaaatcaatgactgga 
catttattaggtgctacaggtggaattgaagctattttttctgcgctatcaattcgtgat 
tcaaaggtagcccctacaatacatgcaatcacaccagacgaagaatgtgatttg 

15 

Sequence 3012 

MNKNNRWITGIGALSPIGNDAKTTWDNALKGVNGIDKITRIDTDDYNVHLAGELKDFNI 
EDH I DRKEARRMDRFTQ YAVVAARE AVKDAQLNI NEKNADRI GVW I GSG I GGMET FEVAH 
TTLVERGPRRVS P FFVPMLI P DMATGQVS I DLGAKG PNGS T VT AC ATGTNS I GEAFK HQ 
20 RGDADAMVTGGTEAPITHMAIAGFSASRALSTNNDPETACRPFQEGRDGFVMGEGAGIVV 
LESLDSAKERGAEIYAEVVGYGSSGDAHHITAPAPEGEGGSRAMQAALDDAGIKAQDVQY 
LNAHGTSTPVGDLYEVQAIKNTFGDAAKSLKVSSTKSMTGHLLGATGGIEAIFSALSIRD 
SKVAPTIHAITPDEECDL 

25 Sequence 3013 

Contig_0564_pos_1547_414 

>pir:pir 1 141060 1 141060 3-oxoacyl- [acyl-carrier-protein] syn 
thase (EC 2.3.1.41) - Escherichia coli >pir:pir 1 184544 II8454 
4 beta-ketoacyl-acyl carrier protein synthase II - Escherich 

30 ia coli >gp: gp I 020767 | ECU20767 1 Escherichia coli beta-ketoa 
cyi-acy3. carrier protein synthase II (fabF) gene, compl«?te c 
ds. NID: g664869. >gp : gp | AE000210 | AE000210_S Escherichia col 
i K-12 MG1655 section lOO of 400 of the complete genome. NID 
: gl787332. >gp; gp I 234 979 | ECFABJ_1 E.coli fabJ gene encoding 

35 beta ketoacyl-acyl carrier protein synthase. NID: g510831.- 
gtgatacatcgacaaacaaatcaattaaacactgataacttagaaaagaagcaacgacaa 
tataaatatgcttttaatttagctgaaattgattctgaaagttttcctatgcacattttt 
agaaaatatgcgaaagacgtt tttgaagaccatcagctatcactattacaacgaggcgag 
cgtcaaggggaatatattttaaggcaacaaatttcacattacttatttaatagtcgtggc 

40 gtcacttgtcacccaaatcaaattattgttggatcatcaacaagccagttactcgatatg 
ataaccaatttactaaaaaaagaagaatttattattgaacagccaagttatccacctatt 
aaacatacgcttgataaaaaaggtataagttatattcaagtcccagttgaacaaaatgga 
atacaaatcgaccctattttaaatacaaataacaatattttgtatataacaccatctcat 
caatttccaactggttatgtcaccaatttaaaaaaaagaacacaattaatcaattggtcc 

45 catcaagctaagcaaagatatattatcgaagatgattatgattcagaatttagatatttt 
ggcaaacccatacctgcattacaaagtttagacacaaaaggaaaagtcatttatattagt 
actttctcaaaatctttatatccaagctgtaggattgcatatattgttttgccacaaaat 
ttaatgcacaaatataataatcaaaaatataaagagggaaatacagtgcctgtgcatgtt 
caacacatgattgctcaattcatgataagtgggaaatttgaaagacatttgaataaaatg 

50 cgaaagatatatagagataaacttgattatattttaaaacgattaaagccctacaatact 
caaa:aaagattgaaggcgcactaactggaatgcattttacaataactgttaataf.tgga 
ttgtcaatgaaacaatgtttaaaaaatgcgaaaaaaaataatttaaaattaaaaccttat 
cattacgaaaattattctaaagtttatccaaaatttattttaggatttggggggataaaa 
aaagaagaattagaagatcatgttaatgcattaattcattcactcgttatataa 

55 

Sequence 3014 

VIHRQTOQLNTDNLEKKQRQYKYAFNLAEIDSESFPMHIFRKYAKDVFEDHQLSLLQRGE 
RQGEYILRQQISHYLFNSRGVTCHPNQIIVGSSTSQLLDMITNLLKKEEEIIEQPSYPPI 
KHTLDKKGISYIQVPVEQNGIQIDPILNTNNNILYITPSHQFPTGYVTNLKKRTQLINWS 
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HQAKQRYIIEDDYDSEFRYFGKPIPALQSLDTKGKVIYISTFSKSLYPSCRIAYIVLPQN 
LMHKYNNQKYKEGNTVPVHVQHMIAQFMISGKFERHLNKMRKIYRDKLDYILKRLKPYNT 
QIKIEGALTGMHFTITVNNGLSMKQCLKNAKKNNLKLKPYHYENYSKVYPKFILGFGGIK 
KEELEDHVNALI HSLVI * 

5 

Sequence 3015 
Contig_0566_pos_9236_0 
is similar to {with p-value 5.0e-35) 
>sp:spIP94378{3MGH_BACSU PUTATIVE 3-METHYLADENINE DNA GLYCO 

10 SYLASE (EC 3.2.2.-). >gp : gp | Z99123 | BSUB0020_157 Bacillus sub 
tills complete genome (section 20 of 21) : from 3798401 to 40 
10550. NID: g2636240. >gp:gp ! D83026 I D83026_59 Bacillus subti 
lis genome sequence covering lic-cel region. NID: gl783231. 
atgaatatagttgatgtaaatgatgttgagaaaatcgacttagcaatagatggcgctgat 

15 gaagtagacagtgcgcttaaccttattaaaggcggtggtggagccttatttagggaaaag 
gtcatagatgaaatggctgaccgatttgtcgttgttgtagatgaaagtaaactcgtcaac 
tatttaggagaaacatttgcattaccagttgaagtcgataaatttaattggtaccaagtt 
gccaaaaaaattgagcgtacttatgatattcatgtaagcagaagagttaatgaagatgta 
ccgtttataaccgacaatggtaattacatattagattgttcattgcaaaatagaattcct 

20 gcttatgagctac 

Sequence 3016 

MNIVPVUrjVEKIDLAIDGADEVDSALNLIKGGGGALFREKVIDEMADRFVVVVDESKLVN 
YLGETFALPVEVDKFNWYQVAKKIERTYDrHVSRRVNEDVPFITDNGNYILDCSLQNRIP 
25 AYELX 

Sequence 3017 

Cont ig_0 5 6 6 pos_8 8 7 5 8 186 

is similar to (with p-value 3.0e-19) 
30 >sp:sp|P32157|YIIM_ECOLI HYPOTHETICAL 26.6 KD PROTEIN IN KD 

GT-CPXA INTERGENIC REGION (0234). >pir : pir t S4 0854 | S4 0854 hyp 

othetical protein o234 - Escherichia coli >gp : gp | L19201 I ECOU 

W87_53 E. coli chromosomal region from 87.2 to 89.2 minutes. 
NID: g304961. >gp : gp | AE0004 66 | AE0004 66_1 Escherichia coli K 
35 -12 MG1655 section 356 of 400 of the complete genome. NID: g 

2367328. 

gtgatatacatgattaaagtgaatgccatatctattggcaaaatagaaacattgtcttat 
ggaaactataaaccaatgcaatcagcgttaaacaaaattccttttaaaggtcaaatgtgg 
ctcaatcgacttgggttcgtggacgatgaacaagcctatcataaccatggtggtatacat 

40 aaagcgatatgttgttttagtaaatctaattatcaattatttaaagatgacttagatcaa 
ttacctgaatttgcaatgtttggagagaatttgacagttgaacatctagatgaagcagat 
gtttP.ttt-.tggtaatcagtatcaactaggcgatacaatcatagaagtatcagatatacga 
gaacctrgttggaaaattcaagctaaatatgcaatacctaatttagttcaaaaaa cgtcg 
caatctggtaaaactggattttattttagagttataaaagaaggatatgtacatcagagt 

45 gataatttaaaactcattaaaaaggcagaatcaaacacacgtctatctgtgaaagactta 
aatcatctattctataatgagcgaaataatttaagattaatctatcatgcacttcgaaat 
ccttatctttcacctgatcgaaagaaaaaactacagaaaatgaaaacgcgtgccgaaaat 
agaaaattcattaaatctgacgataaatag 

50 Sequence 3018 

VIYMIKVNAISIGKIETLSYGNYKPMQSALNKIPFKGQMWLNRLGFVDDEQAYHNHGGIH 
KAICCFSKSNYQLFKDDLDQLPEFAMFGENLTVEHLDEADVYFGNQYQLGDTIIEVSDIR 
EPCWKIQAKYAIPNLVQKMSQSGKTGFYFRVIKEGYVHQSDNLKLIKKAESNTRLSVKDL 
NHLFYNERNNLRLIYHALRNPYLSPDRKKKLQKMKTRAENRKFIKSDDK* 

55 

Sequence 3019 
Con t i g_0 5 6 6_pos_3 5 6 3_2 94 6 
is similar to (with p-value 8.0e-19) 
>sp:sp|P47968|RPIA_MOUSE RIBOSE 5-PHOSPHATE ISOMERASE (EC 5 
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.3.1.6) (PHOSPHORIBOISOMERASE) . . >gp:gp|L35034|MUSRPI_l Mus m 
usculus ribose 5-phosphate isomerase (HPI) mRNA exons 1-6, c 
omplete cds. NID: g836673. 

gtgatatatttggactttataaatcagcagacaaccca^actgcaaaagctttattaggt 
5 gttaaaattatttaccaagatgactatcaaacatatactggatatattgtagaaactgaa 
gcttatttaggtatacaagataaagctgcacatggttttggtggcaaaataacaccaaaa 
gtgacttctttatataaaaaaggtggcacgatatatgcacatgtgatgcatacgcactta 
ttaatcaattttgttacacggactgagggcataccagaaggtgtacttattcgtgctatt 
gaaccagatgaaggtatcggcgctatgaacgtcaatcgtggaaaatctggatacgagctc 
10 actaatggtccaggaaagtggactaaagctttcaatattccacgatcaattgatggctca 
accttaaatgactgcaaattatctatagataccaatcatcgcaaatatccaaaaactatt 
atagaaagtggtcgtatcggtattcctaataaaggagaatggacaaataaaccactacgt 
ttcactgttaaaggcaatccatatgtctctagaatgcgcaaatcagattttcaaaatccc 
gacgatacatggaaataa 

15 

Sequence 3020 

VIYLDFINQQTTQTAKALLGVKIIYQDDYQTYTGYIVETEAYLGIQDKAAHGFGGKITPK 
VTSLYKKGGTIYAHVMHTHLLINFWRTEGIPEiSvLiRAIEPDEGIGAMNVNRGKSGYEL 
TNGPGKWTKAFNIPRSIDGSTLITOCKLSIDTNHRKYPKTIIESGRIGIPNKGEWTNKPLR 
20 FTVKGNPYVSRMRKSDFQNPDDTWK* 

Sequence 3021 

Cont ig_0 5 6 8_pos_4 8 6 7_2 714 

is similar to (with p-value 3.0e-54) 

25 >sp:sp|P45112|RECJ_HAEIN S INGLE- STRANDED- DNA- SPECIFIC EXONU 
CLEASE RECJ (EC 3.1.-.-}. >pir :pir | F64110 | F64110 single-stra 
nded-DNA-specif ic exonuclease (recJ) homolog - Haemophilus i 
nfluenzae (strain Rd KW20) >gp:gptU32801 |U32801_1 Haeraophilu 
s influenzae Rd section 116 of 163 of the complete genome. N 

30 ID: gl574143. 

gtggataatcaggagattcaaaatctatttgaaggcactaacattagtcatgattatatg 
ttattaagtgatatgcaaaaagccattgatcgtattaaattagctatcgatcaaaatgaa 
cgaatactagtatatggtgactatgatgcagatggtgttacatctactacaattctagtg 
tctactttacgtctacttggcgctcaggtggggtggtatattcccaatagatttacagaa 

35 ggatatggacctaatgaattagcatttaaaaatgcttatgacgaagggatttccttaata 
ataactgtagataatggtatacagggacatcatgaaataagtaccatacaagaattaggt 
gtagatgttatagtgacagatcatcatgaaataggagaaactttacctgatgcttttgca 
attgtacatccgatgcatcctaattttgaatatccttttaaatatttatgtggtgcgggt; 
gttgcttataaattggcccaaggattgatagagcatccacctcaacatttcatagcattci 

40 gctgccataggtacaattgcagatttagtatcattgacagatgagaatagatatattgt-a: 
aagcaaggagtaaagatattaaacaatcatacaccatcgtccataaaggctatcttaaa^ 
caagcgggttttaatgatgaaataacagaagaaacaattggttttattattggacctcgcf-' 
ttaaatgcggttggtagactagaagatgcatcattagctgctgaacttttattgtccgat 
gaatttgaagaggcggaatttttagctgaacaagttgaacattttaatcatgaacgtaaa? 

45 gatatagtatctaaaattactaatgaagcattgttattagcagaggaacaaatcaagcaa: 
ggccatttgtttcttttacttgccaaagaggggtggcatgagggtgtattaggtattgte 
gcatctaaaattgtagaaacatatgcactacctacattaattttaaatattgatgaaaat 
caaaatcatgccaagggttctgcgaggtcgattgaacaagtttccatgtttgatatttta 
aatgatcatcaacagttaattgataagtttggtggtcatcacatggctgcaggaatgaea 

50 atgtctatcgataatattgaacatttacataaagagctagatatgtggatgaaagaacta 
actgttaccacttcattagagccttcaataaaggtggatgcacaacttgaagaaaaagaa 
attaacattaaaaatattaaagatatatttcaattaaggccttttggtacggacttta^t 
agtcctctttttatggttagagatctaattgtcaagtcaacaaagggaattggacaggat 
aataagcatcttaagttaacgcttggtcattcaggtttaactgctttattttggaatcat 

55 ggacatttggcaagtgaacttgaaccaggtcaaccgattcatataataggaacattgcaa 
attaatgaatggaatggtaatcaaacacctcaatttatcatcaaagacattgctatagac 
caattacaaattttagattatcgtagtaaacgtaaaaatatacaatttaaagaaactgaa 
tcaaatgttgcctatgtcattcatccgaaacttaaaaaaagcaattcacattacta.tcat 
tatggtgaggttattgatagaccttatgataaaatagtgtttagagatttacctaatact 
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atggttgaaattgaacaaaccttagaacattcacaaatttctcaactatatttagttctg 
cagcatgaaaagtcaatatattttgaaggtatccctagcaagtcgctttttaaaaagtgt 
tacaaagctttaataaacaaaaaagaaactgacttaattaaagaaggtatgctgctttgt 
gagtacttaaatattaagccagaaatattaacattcatgcttaaagtctttaaagaattg 
5 gagtttatttacgacgaaaagggattgattaagataaatccagcgccaaataaacaagat 
atagaaaatagtcgtatttaccaaatgagacgagcacgtatggaagtagaagaacgtctt 
ctctatgatgattttttaaatataaaagaatggataatatcaaagttaacatag 

Sequence 3022 

10 VDNQEIQNLFEGTNISHDYMLLSDMQKAIDRIKLAIDQNERILVYGDYDADGVTSTTILV 
STLRLLGAQVGWYIPNRFTEGYGPNELAFKNAYDEGISLIITVDNGIQGHHEISTIQELG 
VDVIVTDHHEIGETLPDAFAIVHPMHPNFEYPFKYLCGAGVAYKLAQGLIEHPPQHFIAL 
AAIGTIADbVSLTDENRYIVKQGVKIUSTNHTPSSIKAILNQAGFNDEITEETIGFIIGPR 
LNAVGRLEDASLAAELLLSDEFEEAEFLAEQVEHFNHERKDIVSKITNEALLLAEEQIKQ 

15 GHLFLLLAKEGWHEGVLGIVASKIVETYALPTLILNIDENQNHAKGSARSIEQVSMFDIL 
NDHQQLIDKFGGHHMAAGMTMSIDNIEHLHKELDMWMKELTVTTSLEPSIKVDAQLEEKE 
INIKNIKDIFQLRPFGTDFNSPLFMVRDLIVKSTKGIGQDNKHLKLTLGHSGLTALFWNH 
GHLASELEPGQPIHIIGTLQINEWNGNQTPQFIIKDIAIDQLQILDYRSKRKNIQFKETE 
SNVAYVIHPKLKKSNSHYYHYGEVIDRPYDKIVFRDLPNTMVEIEQTLEHSQISQLYLVL 

20 QHEKSIYFEGIPSKSLFKKCYKALINKKETDLIKEGMLLCEYLNIKPEILTFMLKVFKEIi 
EFIYDEKGLIKINPAPNKQDIENSRIYQMRRARMEVEERLLYDDFLNIKEWIISKLT* 

Sequence 3023 

Con t i g_0 5 6 9_po s_4 529_5401 

25 is similar to (with p-value 3.0e-33) 

>sp:sp|P39312 |CYCA_ECOLI D- SERINE/ ALANINE/GLYCINE TRANSPO 
RTER. >pir :pir |S56433 |S56433 hypothetical protein o470 - Esc 
herichia coli >gp : gp | U14003 | ECOUW93_120 Escherichia coli K-1 
2 chromosomal region from 92.8 to 00.1 minutes. NID: gl26317 

30 2. >gp:gp|AE000492 I AE000492_4 Escherichia coli K-12 MG1655 s 
ection 382 of 400 of the complete genome. NID: gl790649. 
atgcctgaactacctgaagttgaacatgttaaaagaggtattgagccatttataaaaagt 
gcaaaaatagagaaagtaacttttgctaaaaatgtaattaacggtaagaataataaccgt 
gagacgattataaaaggtatggaattagatacttttaaaaaacttactgaaggttatgtt 

35 ataaaaaaagttgaaagaagaagtaagtacattattttttatatagcggatcatgacgat 
gatagaatcttagt tag teat ttaggtatggcaggcggattctttgttgttaataacctt 
gatgagataagtacaccgaattatcgaaagcattggcaagtcattttcgatttggataat 
aaacaaaaattagtctattctgatatcagacggtttggagaaattagaaatatagtcaat 
tttgatagttatccatctttattagaaatcgctccagaaccatttgaagaggtggcattt 

40 gaacactatttagaatgtttgacaatgaaaaaatataagaataaaccaataaaacaaacg 
attcttgatcatcgtgttatagcaggagctggaaatatctatgcctgtgaagctttattc 
agagctggtattactccggataaaattactaattcactcactaaacaagaaagaaaatcc 
ctcttttattatgttcgagaagttttagaagagggtataaaatatggaggtactagtatt 
tcagattataggcatgcagatggtaaaactggacaaatgcaattacatttaaatgtatat 

45 aaacaaaaaaagtgcaaggtttgtggtcattcgattgaaacaaaagtgatagctggtaga 
aatagtcatttttgcccaaactgtcagagataa 

Sequence 3024 

MPELPEVEHVKRGIEPFIKSAKIEKVTFAKNVINGKNNNRETIIKGMELDTFKKLTEGYV 
50 IKKVERRSKYIIFYIADHDDDRILVSHLGMAGGFFWNNLDEISTPNYRKHWQVIFDLDN 
KQKLVYSDIRRFGEIRNIVNFDSYPSLLEIAPEPFEEVAFEHYLECLTMKKYKNKPIKQT 
ILDHRVIAGAGNIYACEALFRAGITPDKITNSLTKQERKSLFYYVREVLEEGIKYGGTSI 
SDYRHADGKTGQMQLHLNVYKQKKCKVCGHSIETKVIAGRNSHFCPNCQR* 

55 Sequence 3025 

Cent ig_0569_pos_83 90_8 887 
is similar to (with p-value l.Oe-38) 
>sp:Sp|P44948|FPG_HAEIN FORMAMIDOPYRIMIDINE-DNA GLYCOSYLASE 
(EC 3.2.2.23) (FAPY-DNA GLYCOSYLASE) , >pir :pir | A64104 | A6410 
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4 formaitiidopyriraidine-DNA glycosylase (fpg) homolog - Haeraop 
hilus influenzae (strain Rd KW20) >gp:gp|U32776 |U32776_1 Hae 
mophilus influenzae Rd section 91 of 163 of the complete gen 
ome. NID: gl573969. 
5 gtgattaaacttgggagggtgacatatatgaaatgcccaaaatgtaattctacacattcc 
agagtggttgattcaagacatgcagatgaggccaatgcgattagacgtagaagagaatgt 
gaaaattgcggaacgcgttttacaacatttgaacatattgaagttagtccattaatagta 
gtgaagaaagatgggactagagaacaatttttaagagaaaaaatattaaatggtttagta 
agatcttgcgagaaacgaccagtacgttatcaacaacttgaagacataactaataaagtg 
10- gagtggcaacttagagatgagggacaaactgaaatttcttctagagaaattggagagcat 
gttatgaatttgttaatgcatgttgaccaagtttcctatgtaagatttgcatctgtatat 
aaagaattcaaagatgttgatcaactcttagagtcaatgcaaggtatcttgagtgataat 
aaacggagtgataaatag 

15 Sequence 3026 

VIKLGRVTYMKCPKCNSTHSRWDSRHADEANAIRRRRECENCGTRFTTPEHIEVSPLIV 
VKKDGTREQFLREKILNGLVRSCEKRPVRYQQLEDITNKVEWQLRDEGQTEISSREIGEH 
VMNLLMHVDQVSYVRFASVYKEFKDVDQLLESMQGILSDNKRSDK* 

20 Sequence 3 027 

Contig_0569_pos_1314_691 

is similar to (with p-value l.Oe-33) 
>sp:Sp| P46227 |YRS1_SYNP6 HYPOTHETICAL 19.1 KD PROTEIN IN PS 

BB-RPSl INTERGENIC REGION (ORF 168). >pir : pir | S51484 | S51484 
25 hypothetical protein 168 - Synechococcus sp. >gp : gp | D28752 | S 

Y0RPS1_2 Synechococcus sp . gene for ribosomal protein SI , co 

mplete cds . NID: g560122. 

atgtctgttgtaccatggcaacaattgaatcctgctgacagtccatacgttaaaatgttt 
ggattagttggaatcccttttgcagcaggtattattaactttgttgtacttacagctgca 

30 gcctcttcttgtaatagtggtatatttgctaatagccgtacgatgtttggattagctgga 
agaaagcaaggtccagcattcttacatagaaccaataagcacggcgtaccacattatgct 
attttagtgacatgtggcttattaagtatttcagtcgtgttaaatgcaatttttaaagat 
gcgactaaagtgttcgtacaaattacaacattttcaactgttttaaatattatgatttgg 
acaattattatgatcgcgtatctaggttatttaagacatgaaccgaaacagcataaagaa 

35 agtaactataaaatgtggggcggaaaatacatggcttacagtattttagggttctttgca 
tttatttttattatactattgattaatagtgcaacgcgttatgccgtactttctgcaccc 
gtatggtttgttatcatgctattgatgtatcaaaaatataaaaaagaatctcgcaaagct 
aaaattaaaaatgaggaagagtaa 

40 Sequence 3028 

MSWPWQQLNPADSPYVKMFGLVGIPFAAGIINFWLTAAASSCNSGIFANSRTMFGLAG 
RKQGPAFLHRTNKHGVPHYAILVTCGLLSISWLNAIFKDATKVFVQITTFSTVLNIMIW 
TIIMIAYLGYLRHEPKQHKESNYKMWGGKYMAYSILGFFAFIFIILLINSATRYAVLSAP 
VWFVIMLLMYQKYKKESRKAKIKNEEE* 

45 

Sequence 3029 
Con t i g_0 5 7 0_pos_0_4 8 8 
is similar to (with p-value 5.0e-39) 
>sp:sp( P54715 I PTIB„BACSU PTS SYSTEM, ARBUTIN-LIKE IIBC COMP 

50 ONENT (PHOSPHOTRANSFERASE ENZYME II, BC COMPONENT) (EC 2.7.1 
.69). >gp:gp|Z99108|BSUB0005_89 Bacillus subtilis complete g 
enome (section 5 of 21) : from 802821 to 1011250. NID: g26330 
55. >gp:gp|D50543 |D50543_3 Bacillus subtilis DMA for 76-degr 
ee region, complete cds. NID: gl486240. 

55 atgtttgcttttttcggtattgttttgggattcgctacattatttaaaaatccaaccatt 
atgggaggattagctgatcagcaaacattttggtttaaattttggtctgttattgaatca 
ggtggttgggtaatatttacacatatggaaattgtctttgtagttggcttaccattatct 
cttgctaaaaaagcaccaggacatgcagctttagcagctctaatgggatatttaatgttt 
aatacttttatcaatgcaattttaactcaatggccacatacttttggcgctaatttaaaa 
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55 



aaaggtgtagaaaacacaacaggattaaaatcgattgcaggtattgaaacgttagatacc 
aatattttaggtgcaatcattatctcaggaataataacgtggatacataatagatattac 
agtaagcgtttacctgaaatgttaggtgtatttcaaggattaacattcgttgtaacaatc 

tctttctt 

Sequence 3 03 0 

MFAFFGIVLGFATLFKNPTIMGGLADQQTFWFKFWSVIESGGWVIFTHMEIVFWGLPLS 
I^KKAPGHAALAALMGYLMFNTFINAILTQWPHTFGANLKKGVENTTGLKSIAGIETLDT 
NILGAI 1 1 SGI ITWIHNRYYSKRLPEMLGVFQGLTFWTI SFX 



Sequence 3031 
Contig_0580_pos_966_13 82 
is similar to (with p-value 5.0e-85) 
>sp: sp| P43848 | PUR5_HAEIN PHOSPHOR I BOSYLFORMYLGLYCINAMI DINE 

15 CYCLO-LIGASE (EC 6.3.3,1) (AIRS) ( PHOSPHORIBOSYL-AMINOIMIDAZ 
OLE SYNTHETASE) (AIR SYNTHASE). >pir :pir | G64122 | G64122 5 ' -ph 
osphoribosyl-5-aniinoimidazole synthetase (purM) homolog - Ha 
eroophilus influenzae (strain Rd KW20) >gp : gp | U32822 | U32822_2 
Haemophilus influenzae Rd section 137 of 163 of the cottplet 

20 e genome. NID: gl574265. 

atggaagcacaaatggaaaaagatggtaattactatatggaaggaatattagatgatatt 
caacaagatggatatggtttcttaagaaccgttaactattctaaaggtgagaaggatatt 
tatatttctgcaagccaaattcgacgttttgaaataaaacgtggtgataaagtaacgggt 
aaagttcgtaaaccaaaagataatgaaaaatattatggtctacttcaagttgattttgta 

25 aacgaccataatgcagaagaagtcaaaaaacgtccttcacttccaagctttaacacctct 
ttatccggaagaaagaatcctattagaaacgcaatctacaaattattccactcgtattat 
ggatttagtcacaccaataggtcttggtcaacgtggtcttatagttgcaccacctaa 

Sequence 3 032 

30 MEAQMEKDGNYYMEGILDDIQQDGYGFLRTVNYSKGEKDIYISASQIRRFEIKRGDKVTG 
KVRKPKDNEKYYGLLQVDFVNDHNAEEVKKRPSLPSFNTSLSGRiCNPIRNAIYKLFHSYY 
GFSHTNRSWSTWSYSCTT* 

Sequence 3 033 
35 Cent ig_0580_pos_2122_3 000 

is similar to (with p-value l.Oe-27) 

>sp:sp|P38527|RHO„THEMA TRANSCRIPTION TERMINATION FACTOR RH 
O. >gp:gp|L27279 |TM0RH0_1 Thermotoga maritima rho gene, comp 
lete cds. NID: g454858, 
40 atgaaagcgccagttctggtatcaggtactgacggtgtgggtacaaagttaaaattagca 
attgactatggaaagcatgacacaattggtattgatgctgtcgcaatgtgtgtaaatgat 
attttaacaacaggtgctgaacctttatactttttagactatattgccacgaataaagta 
gtgccaagtactatagagcaaatcgttaaaggtataagtgacggttgcgaacaaaccaat 
acggcacttataggcggtgaaactgctgaaatgggagaaatgtatcatgaaggtgaatat 
45 gatattgctggttttgcagtaggagcggtagagaaagaggactatattgatggttcaaat 
gttgaagaaggacaagcaattattggtttagcttcaagtggtattcattcaaatggctat 
agtctagttagaaaaatgataaaagaatcaggagttcaattacatgatcaatttaatggt 
caaacctttttagaaaccttccttgcaccaacaaaattgtatgtaaagcctattcttgaa 
ttaaagaaacatattgatatcaaagcgatgagccatattactggtggaggtttctatgaa 
50 aatattccgcgtgcccttcctaaaggtttatcagcaaaaatagatacacaatcattccca 
acgttggaagtctttaattggcttcaaaaacagggcaacatttcaacgaatgaaatgtat 
aacatatttaatatgggtattggatatacaattattgttgacaaaaaagatgttcaaaca 
acattaacaacgttacgtgcaatggatacaactgcatatgaaattggtgagattataaaa 
gatgatgatacacctattcatttattggaggtagaatag 



Sequence 3034 

MKAPVLVSGTDGVGTKLKLAIDYGKHDTIGIDAVAMCVNDILTTGAEPLYFLDYIATNKV 
VPSTIEQIVKGISDGCEQTNTALIGGETAEMGEMYHEGEYDIAGFAVGAVEKEDYIDGSN 
VEEGQAIIGLASSGIHSNGYSLVRKMIKESGVQLHDQFNGQTFLETFLAPTKLYVKPILE 
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LKKHIDIKAMSHITGGGFYENIPRALPKGLSAKIDTQSFPTLEVFNWLQKQGNISTNEMY 
NIFNMGIGYTirVDKKDVQTTLTTLRAMDTTAYEIGEIIKDDOTPIHLLEVE* 

Sequence 3035 
5 Contig_0581upos_6447_3 826 

is similar to (with p-value 4.0e-19) 
>gp:gp|X99710 |LLLVSFPEP_4 L.lactis ORF, genes homologous to 
vsf-1 and pepF2 and gene encoding protein homologous to met 
hyltransf erase. NID: gl771200. 

10 atgaaagctagtgaaattaggcaaaaatatttgaatttctttgtagaaaaaggacatatg 
attgaaccgtctgcaccacttgtacctatcgatgatgattcattattgtggatcaattct 
ggtgtagctaccttgaaaaaatactttgacggacgcgaaactcctaaaaaaccaagaatt 
gtcaattctcaaaaagctatacgaacaaacgatatagaaaacgtcggttttactgctcgc 
catcatactttttttgagatgctaggtaatttttcaatcggtgactactttaaacatgaa 

15 gcgattgaatttgcttgggaatttctaacaagtgataaatggatgggcatggaacctgag 
aagctatatgtgactattcatcctgaagatactgaagcttttagaatttggcatgaagac 
attggtttagaggaaagtcgcatcattcgaatagaaggcaatttttgggatatcggggaa 
ggaccatctggaccaaatacagaaattttctatgatcgtggatcggcttatggaaaagat 
gatcctgctgaagaaacgtatcctggtggtgaaaatgaaagatatttagaagtttggaat 

20 ctagtgtttagtgagtttaatcacaataaagacaatacttacacaccactaccaaataaa 
aatattgatactggcatgggattagaacgtatgacgtctatctcacaaaatgtaagaaca 
aattatgaaacagacttatttatgcctataattaaggaagtagaacatgtttcaggaaaa 
aaatatttaattgatgatgcacaagatgttgcatttaaagttatcgcagaccacattaga 
acgatttctttcgcaattgctgatggcgcattaccagctaatgagggtagaggatatgta 

25 ttaagaagattattacgcagagcagttcgctttagccaatcattaggaattaatgaacca 
tttatgtataaacttgttgatatagtcgctgatatcatggaaccatattatccaaatgtc 
aaagacaaatccaactttattaaacgtgtcattaaatcagaggaagaacgcttccatgaa 
acgcttgaggaaggtcttacgattttaaatgaactgataaaagaagcgaagaatagtgat 
caggttattaaaggtcatgatgcttttaagttatatgatacttatggattcccaatagaa 

30 ttaactgaagaattagcaactcaagaaaatttgtctgttgatatgcccacttttgaacag 
gaaatgcaacaacagagagatcgagctagacaagctcgacagaattctcaatcaatgcaa 
gttcaaagtgaagtactaaaaaatattcaagatgaaagtcaatttgttggctatgaaact 
acggactatcaatcattaataactcatatcatatacaatggtgaagaagttaaacatgtt 
gaagcaggagaaacaatttactttattttaagagaaacgcctttctatgcagtaagtggt 

35 ggacaggtcgcagataagggaacagttggtaatgagagctttgaaataaatgtaactgac 
gtaactaaagcgcctaatggccaaaacttacacaaaggtattgtgcaatttggtgaagca 
acacagaacgcgaaagtagaagcacgtgttaacaaagaggatagacgacttattcaaaaa 
aatcatagtgctacacatttattacatgctgcattaaaagaagtattaggagatcatgtt 
aatcaggctggttcgttagtagaacctgaaagactacgttttgatttctcacattttggt 

40 cctatgacacaagaagaaattaatttagtagaacgtagagtaaatgaagaaatttggaga 
gctatcgacgtccgtattcaagaaatgagtattgaagaagccaaatcaataggcgctatg 
gctttatttggtgaaaaatatggagatattgttcgcgttgtcaatatggcaccattttca 
atagaattgtgcggtggaatacatgtgaataatactgcggaaattggtctctttaagatt 
gtgagtgaatctggaacaggtgccggtgttagaagaattgaagctttaacaggtaaaggt 

45 gcattcttacatcttgaagaaattgaaacacagtttaataatattaaaaatcatttaaaa 
gttaaatccgataaccaagtagttgaaaaagttaaacaacttcaagaagaagaaaaagga 
ctgcttaaacaattagaacaacgcaacaaagaaataacatcactaaagatggggaacatt 
gaagaacaggttgagttgattaataatttgaaagttttagcaacagaagtagaaattcca 
aatccaaaagcgatacgttcaactatggatgactttaaatctaaacttcaagatactatt 

50 atagtgttagtcggacaagttgatggaaaggtttctgtaattgctacagtaccaaaatca 
cttacaaatcaagtaaaagctggagatcttatcaaaaacatgacaccaattattggtgga 
aaaggtggaggtcgtcctgatatggctcaaggtggcggaactcaacctgaaaaaataaca 
gaagcattacgctttattaaagattacattaaaaatctataa 

55 Sequence 3036 

MKASEIRQKYLNFFVEKGHMIEPSAPLVPIDDDSLLWINSGVATLKKYFDGRETPKKPRI 
VNSQKAIRTNDIENVGFTARHHTFFEMLGNFSIGDYFKHEAIEFAWEFLTSDKWMGMEPE 
KLYVTIHPEDTEAFRIWHEDIGLEESRIIRIEGNFWDIGEGPSGPNTEIFYDRGSAYGKD 
DPAEEMYPGGENERYLEVWNLVFSEFNHNKDNTYTPLPNKNIDTGMGLERMTSISQNVRT 
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NYETDLFMPIIKEVEHVSGKKYLIDDAQDVAFKVIADHIRTISFAIADGALPANEGRGYV 
LRRLLRRAVRFSQSLGINEPFMYKLVDIVADIMEPYYPNVKDKSNFIKRVIKSEEERFHE 
TLEEGLTILNELIKEAKNSDQVIKGHDAFKLYDTYGFPIELTEELATQENLSVDMPTFEQ 
EMQQQRDRARQARQNSQSMQVQSEVLKNIQDESQFVGYETTDYQSLITHIIYNGEEVKHV 
5 EAGETIYFILRETPFYAVSGGQVADKGTVGNESFEINVTDVTKAPNGQNLHKGIVQFGEA 
TQNAKVEARVNKEDRRLIQKNHSATHLLHAALKEVLGDHVNQAGSLVEPERLRFDFSHFG 
PMTQEEINLVERRVNEEIWRAIDVRIQEMSIEEAKSIGAMALFGEKYGDIVRWNMAPFS 
lELCGGIHVNNTAEIGLFKIVSESGTGAGVRRIEALTGKGAFLHLEEIETQFNNIKNHLK 
VKSDNQWEKVKQLQEEEKGLLKQLEQRNKEITSLKMGNIEEQVELINNLKVLATEVEIP 
10 NPKAIRSTMDDFKSKLQDTIIVLVGQVDGKVSVIATVPKSLTNQVKAGDLIKNMTPIIGG 
KGGGRPDMAQGGGTQPEKITEALRFIKDYIKNL* 

Sequence 3037 
Con t ig_0 5 8 l_pos_3 5 0 0_3 072 
15 is similar to (with p-value 5.0e-25) 

>gp:gp|U46071 |RCU46071_3 Rhodobacter capsulatus cytochrome 
c biogenesis (cycH) gene, complete cds, and sarcosine oxidas 
e gene, partial cds . NID: gl353871, 

atgctaaagcataaaattttaggactagatgttggaagtaaaactgttggcattgctata 
20 agtgaccttatgggttggactgctcaagggttagacacactccgtatcaacgaagaacaa 
gatgatttaggaattgatcaactcgtgaagattattaaagataatcaagttggaactgtt 
gttattggcttgcccaagaatatgaacaattcaataggttttcggggagaggcttcaata 
aaatataaagaaaagttacaagagtctatcccttctattgatattgttatgtgggacgaa 
cgtttaagtacaatggctgctgaaagatctttacttgaagcagatgtttcaagacaaaaa 
25 agaaaaaaggtaatagataaaatggcagctgtatttattttacaaggctatttagattct 
attcaataa 

Sequence 303 8 

MLKHKILGLDVGSKTVGIAISDLMGWTAQGLDTLRINEEQDDLGIDQLVKIIKDNQVGTV 
30 VIGLPKNMNNSIGFRGEASIKYKEKLQESIPSIDIVMWDERLSTMAAERSLLEADVSRQK 
RKKVIDKMAAVFILQGYLDSIQ* 

Sequence 3039 
Con t ig_0 5 8 l_pos_2 1 7 9_1 685 
35 >sp:sp|P00957 |SYA_ECOLI ALANYL-TRNA SYNTHETASE (EC 6.1.1.7) 
(ALANINE — TRNA LIGASE) (ALARS) . >gp : gp | D90 892 | D90892_9 E.co 
li genomic DNA, Kohara clone #446(60.5-60.9 min . ) . NID: gl80 
0074. >gp:gp| AE000353 |AE000353_11 Escherichia coli K-12 MG16 
55 section 243 of 400 of the complete genome. NID: gl789037. 

40 

atgaatagaactaaaaatattttagaaattggtacagccattggttatagttcaatgcaa 
ttcgctaatatttcgaaagatataaatgtaacaacaattgaacgaaacgaagacatgatt 
catcttgcaaaaaagttcataaaaaagtatcgataccagaatcaaatccgtttaattgaa 
tacgatgctttgaatgcatttgaacaagtcaatgacaaacaatatgatatgatatttatc 
45 gacgcagctaaagcacaatcaatgaaatttttccaactatatacaccgttattaaaaaaa 
ggtggaattgtggttactgataatgttttatatcatggatttgtttcaaatatagacgtt 
gttcgttcgagaaatgtgaaacaaatggttaagaaagtgcaacagtacaatgaatggttg 
atggagcaatctcaatttacaacaaactttataaatatggatgatggattagcaatatct 
ataaaaggagaatga 

50 

Sequence 3040 

MNRTKNILEIGTAIGYSSMQFANISKDINVTTIERNEDMIHLAKKFIKKYRYQNQIRLIE 
YDALNAFEQVNDKQYDMIF I DAAKAQSMK FFQL YTPLLKKGG I WTDNVLYHGFVSNIDV 
VRSRNVKQMVKKVQQYNEMLMEQSQPTTNFINMDDGLAISIKGE* 

55 

Sequence 3041 

Contig_0584_pos_9628_10956 
is similar to (with p-value 2.0e-23) 
>sp:sp|P75822|YBJT_ECOLI HYPOTHETICAL 53.7 KD PROTEIN IN AR 
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TP-POXB INTERGEaaiC REGION. 

atgatgccatcagcaaaattaacgcaggctagctttgaagatatggatgcattgttagca 

gataattttgcgcgtgcggcgcaaaagcagggtgttaagcatattgtctacatgagtggt 
ttaataccagaaaatgatgagctatctgcacatttaagaagtcgacttgaatgcgaaaaa 

5 attcttggtgattacggtataccagttagcacattacgtgcaggtttaattattggtgca 
aaaggaagttcttatccaattcttaaacgactagttaagagattgccagcaatggttttg 
cctagttgggctcacaataaaattgcacctgttgccattgacgatgtgatagatggttta 
gcagcattggtgaatcgaacacccaaagataacgaagcaattgatatcacaggtcctgaa 
gtgatgaattataaaacgctgatacagcgcacagctaacgtacttgataagcgactgcct 

10 atgcttgatttacctattatacccattatcataagtcggtattgggtacaactgatttca 
aatgtaccgaaagaaatggtatatccattaatgaatagtttaactcacgatatggtacca 
catcgaaaacgcgttgtgtctaacttgtccgtaggaaatatcacctttgaagatagtgtg 
aaaagagcactaagagaagaacaaaagacttctaagaaaaagtcggattcgaaaaattct 
caatcatttgggcgtatgcatcaagaaattaaagatgtacgagccattacacggtttaaa 

15 attccggaaggttattcgattaaagatgtgactaaagaatatgcaaaattcatcaataat 
atcacactacatctcgttaaaggtacgataaatgaacgagaatttaatatgaatttgccc 
ttcattaataaatttattttaaaaatggaacgtgatgaagctgactctacagaagatatg 
gtggtatataatattgtgggtggcgatttagcacattcaaatgatggtggaaatgcacgc 
tttgaattccgaagaataagaaacaccaatgagggtattattgctttacaagaatatgaa 

20 cc tacattaccttgggtagtatataaactaactcaagctaaagcacacaagactgttatg 
aatatttttaaaaataaaatggcacgtttatcgcaacaaaaaaatgtgaaagatgaaaca 
tatatgtctaatcgtgtaactattggagtaacagtagcatcagcgttcgttattggaagc 
gcagtagggttccaactttttaaaaagcatcaaatcaaaaagaacacaatgtcgaatgca 
gaattataa 

25 

Sequence 3 042 

MMPSAKLTQASFEDMDALLADNFARAAQKOGVKHIVYMSGLIPENDELSAHLRSRLECEK 
ILGDYGIPVSTLRAGLIIGAKGSSYPILKRLVKRLPAMVLPSWAYNKIAPVAIDDVIDGL 
AALVNRTPKDNEAIDITGPEVMNYKTLIQRTANVLDKRLPMLDLPIIPIIISRYWVQLIS 
30 NVPKEMVYPLMNSLTHDMVPHRKRWSNLSVGNITFEDSVKRALREEQKTSKKKSDSKNS 
QSFGRMHQEIKDVRAITRFKIPEGYSIKDVTKEYAKFINNITLHLVKGTINEREFNMNLP 
FINKFILKMERDEADSTEDMWYNI VGGDLAHSNDGGNARFEFRRIRNTNEG 1 1 ALQEYE 
PTLPW\ArfKLTQAKAHKTVMNIFKNKMARLSQQKNVKDETYMSNRVTIGVTVASAF^ 
AVGFQLFKKHQ I KKNTMSNAEL * 

35 

Sequence 3 043 
Cont ig_0 5 8 5_pos__4 3 2 8_47 4 7 
is similar to (with p-value 7.0e-17) 
>sp:sp|P77279|YBBL_ECOLI HYPOTHETICAL ABC TRANSPORTER ATP-B 
40 INDING PROTEIN IN USHA-TESA INTERGENIC REGION. >gp : gp | U82664 
|ECU82664„88 Escherichia coli minutes 9 to 11 genomic sequen 
ce. NID: gl773084. >gp:gp| AE000155 |ae000155_6 Escherichia co 
li K-12 MG1655 section 45 of 400 of the complete genome. NID 
: gl786692. 

45 atgcaacaaagtgagttaatcggttatacaattgaagataatatgaaatttcctgctgag 
gctagaagtgaagcttttgaccgtgataaagcgaaacaactcatctctcaagtaggatta 
ggtaattatcagttagatgctcaaattgagcacatgtctgggggagagcaacaacgtatt 
accatcgctagacaactcatgtatgaacctgaagttttattattggacgaagctactagc 
gctttagatacacataataaaaagaaaattgaagaaattatatttaaactagcagataaa 

50 gggattgccattttgtggattacgcatagtgatgaccaaagtatgcgtcattttaagcgt 
agaatcacaattactgacggtaagatatcgagtgatgaggagttgaatggtaatgagtaa 

Sequence 3044 

MQQSEL IG YT I EDNMKFPAEARSEAFDRDKAKQL I SQVGLGNYQLDAQ I EHMSGGEQQRI 
55 TIARQLMYEPEVLLLDEATSALDTHNKKKIEEIIFKLADKGIAILWITHSDDQSMRHFKR 
RITITDGKISSDEELNGNE* 

Sequence 3045 

Cont ig_0 5 8 5_pos_5 1 1 2_5 516 
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is similar to (with p-value 2.0e-32) 
>sp:sptP77307|YBBM_ECOLI HYPOTHETICAL 28.2 KD PROTEIN IN US 

HA-TESA INTERGENIC REGION. 

atgttagctaataatggtttaatcgcgattaatctcgcttatcagaatttagaaagagca 
5 tttgttcaagatgtttctgatattgaatccaaacttacgttagcagcgacacctaagctc 
gcatcaaaatcagctattagagaaagtatacgcttagcaattgttcctacaattgattct 
gtaaaaacatatggtctagtttcaattccaggtatgatgacaggattgattatcggaggc 
gttgacccacttcaagcaattaaatttcaattgcttgtcgtgtttattcatacaacagcg 
acgattatgtctgcactcattgcaacgtatatgagttacggtcagttctttaatgctcgt 
10 catcaactcattgctagaacgcaacgcacaagacaaagtagttaa 

Sequence 3 04 6 

MLANNGLIAINLAYQNLERAFVQDVSDIESKLTLAATPKLASKSAIRESIRLAIVPTIDS 
VKTYGLVS I PGMMTGLI IGGVDPLQAIKFQLLWFIHTTATIMSALIATYMSYGQFFNAR 
15 HQLIARTQRTRQSS* 

Sequence 3047 
Con t i g_0 5 8 9_j)o s_ 9 9 9_5 5 9 
is similar to (with p-value l.Oe-35) 
20 >gp:gp| Y10549 |BSFMS_1 B . s tearothermophilus fms gene. NID: g 
2266413 . 

atgataacaatgaaagatattataagagatggtcatccaacacttcgtgaaaaagcgaaa 
gaattaagcttcccactttctaacaatgataaagaaacattgcgcgcaatgcgtgaattt 
ctaatcaatagtcaggatgaagaaaccgcaaaacgttatggtttacgttctggcgtaggt 
25 ttagctgctccacaaattaatgaaccaaaacgtatgattgctgtctacttacctgatgat 
ggaaacggtaaatcgtatgattatatgctcgtaaatcctaaaataatgagttacagtgta 
caagaagcttatttaccaactggcgaaggttgtctaagtgttgatgaaaacatcccaggt 
ttagtgcatcgtcatcatttcttaatatcactttcggttttaaaaccacaaaatttaact 
atcatagagattcttctttaa 

30 

Sequence 3048 

MITMKDIIRDGHPTLREKAKELSFPLSNNDKETLRAMREFLINSQDEETAKRYGLRSGVG 
LAAPQINEPKRMIAVYIiPDDGNGKSYDYMLVNPKIMSYSVQEAYLPTGEGCLSVDENIPG 
LVHRHHFLISLSVLKPQNLTIIEILL* 

35 

Sequence 3049 
Cent ig_05 9 l_pos_l 3 3 6_2 091 
is similar to (with p-value 7.0e-70) 
>sp:sp|Q06753 |YACO__BACSU HYPOTHETICAL RRNA METHYLASE IN CYS 

40 S 3'REGION. >gp : gp |D26185 | BAC180K_157 B. subtilis DNA, 180 k 
ilobase region of replication origin. NID: g467326. >gp:gp|2 
99104 1 BSUB0001_96 Bacillus subtilis complete genome (section 

1 of 21): from 1 to 213080. NID: g2632267. 
gtgaatgtggaagatatagtgatagtaggtagacacgcagttaaagaagcaattatatca 

45 ggtcacgccataaataagattttgattcaagacggtataaaaaagcaacaaattaacgac 
attttaaaaaatgcaaaatcacaaaaattaattgtacaaacggtaccaaaatctaaatta 
gattttttagcaaatgcacctcaccagggtgtggctgctttagtagccccatatgaatat 
gcaaacttcgatgaatttttacaaaaacaaaagaaaaaagcccgttattcaactgttatc 
attttagatggtttagaagacccgcataatcttggctctatattaagaacagcagatgct 

50 tctggtgttgatgcggttattatacctaaaagacgatcagttgcgctaacacagaccgtt 
gcaaaagcttctacaggagcgattcagcatgttccggttataagggttactaatctttcg 
aaaactatcgacgaattaaaagacaacggcttttggattgcggggacagaagctaataat 
gcaacggattatagagatttacaagcagatatgtcactaggtattgtaataggtagtgag 
gggcaaggtatgagtcgtttagtgagtgataagtgtgattttcatattaagattccaatg . 

55 gttggacatgtcaatagcttgaacgcgtctgtggctgcaagtttaatgatgtatgaagta 
tatcgtaaacgtcatcagttagaggaaaagtcatga 

Sequence 3050 

VNVEDIVIVGRHAVKEAIISGHAINKILIQDGIKKQQINDILKNAKSQKLIVQTVPKSKL 
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DFLANAPHQGVAALVAPYEYANFDEFLQKQKKKARYSTVIILDGLEDPHNLGSILRTADA 
SGVDAVI I PKRRSVALTQTVAKASTGAIQHVPVIRVTNLSKTIDELKDNGFWI AGTEANN 
ATDYRDLQADMSLGIVIGSBGQGMSRLVSDKCDFHIKIPMVGHVNSLNASVAASLMMYEV 
YRKRHQLEEKS* 

5 

Sequence 3051 

Contig_0 592_pos_3 5 9 6_3 057 

is similar to (with p-value 3,0e-43) 
>sp:sp|Q47155 |DINP_ECOLI DNA- DAMAGE- INDUCIBLE PROTEIN P. >g 
10 p:gp|D38582|ECODINJ_ll Escherichia coli genes for 'YafH, Yaf 

I, Yaf J, YafK, YafQ, DinJ, YafL, YafM. FhiA, MbhA, DinP, Yaf 
YafO and YafP. NID; g984576. >gp : gp | D83 53 6 | ECOTSF_39 Esch 

erichia coli genome, 4.0 - 6.0 min region. NID: gl208942. >g 

p:gp |U70214 I ECU70214_72 Escherichia coli chromosome minutes 
15 4-6. NID: gl552727. >gp:gp | AE000131 | AE000131_10 Escherichia 

coli K-12 MG1655 section 21 of 400 of the complete genome. N 

ID: gl786415. 

atggattatttttttgctcaagttgaaatgagagataatcctaaactaaaaggaaaacct 
gtcatcgttggcggtaaagcgagtcatcgaggcgtagtttctacggcatcatacgaagca 

20 agagcttatggtgttcactctgctatgcctatgactcaagcacataagctatgccccaat 
ggatattatgtaacaagccgttttgatacttatagagaggtatctggtcaaatcatgaaa 
atattcagaagttatacagaattagtagaacccatgtctttagatgaagcttatttagat 
attacacatttagtgagaccggatttaccagcatcaaccattgcaaattatattcgcaga 
gatatatacgaagtaacacgtttaactgcgtcagctggcgtgtcttataataagttttta 

25 gcaaagttagcgagtggtatgaacaagccgaatggtttgacagtaattgattacaataat 
gtacatgaaatattaatgcaattagatattggagattttccaggggtaaaaagcatatag 

Sequence 3 052 

MDYFFAQVEMRDNPKLKGKPVIVGGKASHRGWSTASYEARAYGVHSAMPMTQAHKLCPN 
30 GYYVTSRFDTYREVSGQIMKIFRSYTELVEPMSLDEAYLDITHLVRPDLPASTIANYIRR 
DIYEVTRLTASAGVSYNKFLAKLASGMNKPNGLTVIDYNNVHEILMQLDIGDFPGVKSI* 



Sequence 3 053 
35 Con t ig_0 5 9 3 _po s_6 0 8_1 3 0 6 

is similar to (with p-value l,0e-57) 
>sp:sp|Q413 64|SOTl_SPIOL 2-OXOGLUTARATE/MALATE TRANSLOCATOR 
PRECURSOR. >gp:gp|A47930|A47930_l Sequence 1 from Patent WO 
9534654. NID: g2301793. >gp : gp | U13238 | SOU13238_l Spinacia ol 
40 eracea envelope membrane 2-oxoglutarate/raalate translocator 
(SODiTl) mRNA. chloroplast mRNA encoding chloroplast protein 
, complete cds . NID: g595680. 

atggcctttttcatttcaagaggatttgtaaaaacagggctaggtcgacgtattgctctg 
caattcgttaaattatttggaaagaaaacgcttggtttggcttattcacttgttggtgtt 

45 gaccttatcttagctcctgctacgccaagtaatacagcacgtgctggtggtattatgttt 
ccaatcattaagtccttgtcagagtcatttggttcatcgccgagagatggttctgagaga 
aaaatgggtgcgtttttaatctttactgagttccaaggtaatttaattacttcagctatg 
tttttaacagctatggccggtaaccctatagcgcaaagtttagctgaaaaaacggcacac 
gttcaaattacatggatgaattggtttgttgctgctattatacccggattgatttctctc 

50 atcgttgtccctttcattatttataaattatacccacctactgttaaagaaacgcctaac 
gctaaaaaatgggctactgaacaactagaagaaatgggacatatgtctatagccgaaaaa 
ttgatggttggtgtctttatcatagcattggctttgtgggtattaggaagcttcattaat 
gttgatgccacgctcactgcatttattgctttagcattgttactattaacaggtgtatta 
gcgtggtcagatattttaaatgaaacagattctgcctaa 

55 

Sequence 3054 

MAFFISRGFVKTGLGRRIALQFVKLFGKKTLGLAYSLVGVDLILAPATPSNTARAGGIMF 
PIIKSLSESFGSSPRDGSERKMGAFLIFTEFQGNLITSAMFLTAMAGNPIAQSLAEKTAH 
VQITWMNWFVAAIIPGLISLIWPFIIYKLYPPTVKETPNAKKWATEQLEEMGHMSIAEK 
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LI^GVFIIALALWVLGSFINVDATLTAFIAIiALLLLTGVLAWSDILNETDSA* 

Sequence 3055 
Con t ig_0 5 9 3_pos_6 2 6 0_5 115 
5 is similar to (with p-value 4.0e-36) 

>sp:sp|P37469|DNAC_BACSU REPLICATXVE DNA HELICASE (EC 3.6.1 
.-). >gp:gp|D26185 |BAC180K_4 B. subtilis DNA, 180 kilobase r 
egion of replication origin. NID: g467326. >gp : gp | Z99124 | BSU 
B0021_149 Bacillus subtilis complete genome (section 21 of 2 

10 1): from 3999281 to 4214814. NID: g2636442. 

atgttctcacatgatggtatgaagtcatttatggaatatgtattcgaagtcggtaaggta 
gatcataacgaaatctatttaaaaaccacaaaagataagtcattcctagatatggacacc 
atttcaaatttgtataactcaaaatttataggttacggattctttgaaagatatcaacaa 
gatttgctcaatctttatcaaatagagcgtacgcaaaacgtattacaagaattcaattct 

15 gatccgaatatacaaaattttgatgaaatgcttaacaaactacaaaaggtcagtttaatt 
agtgcaagtgaagaaagtgggactaaaaaaattgtagatcactttgtcgaagaattatat 
agcgaagaaccaaaacaaaaaa tcaa tacagg t ta taaac tgg tggat t acaaaatagg t 
ggtttagaacctacacagttgattgtaatcgctgcgagaccgtcagtaggtaaaacgggg 
tttgcgcttaatatgatgcttaatatagcgtctcaaggctataaaacttcattcttcagt 

20 ctagagacaactggcgtgtctgtattgaaaaggatgttatcagcagaaactgggatagaa 
ctaactcgtatcaaagaaattaaagatttagaaccggatgaattaacacgtttaacaact 
gcagcagacagaatactcaaacttgatatagatatacacgataaaagcaatattactaca 
catgatgtacgtaaacaagcgatgaagaacaaagatgtgcaacaggttatcttcattgac 
tacttacaacttatgcagacagacagtaagttagatcgtcgtaatggtatcgaaaagata 

25 tcgcgagatttgaagattattgcaaatgaaacaggtgcaattattgtgttgctatctcaa 
ttgagcagaggtgtagaaacaagaaatgacaaaagacctatgctatctgacatgaaagaa 
gcaggtggaattgaagcagatgcaagtttagctatgttgttatatcgagatgattactac 
aaccgtgatgatgttgatgactcaggcaagtcaattgttgaatgtaacatcgcaaagaat 
aaagacggagaaacaggtgtagttgagtttgagtactacaagaaaacgcagaggttcttc 

30 acatga 

Sequence 3056 

MFSHIX3MKSFMEYVFEVGKVDHNEIYLKTTKDKSFLDMDTISNLYNSKFIGYGFFERYQQ 
DLLNLYQIERTQNVLQEFNSDPNIQNFDEMLNKLQKVSLISASEESGTKKIVDHFVEELY 
35 SEEPKQKINTGYKLVDYKIGGLEPTQLIVIAARPSVGKTGFALNMMLNIASQGYKTSFFS 
LETTGVSVLKRMLSAETGIELTRIKEIKDLEPDELTRLTTAADRILKLDIDIHDKSNITT 
HDVRKQAMKNKDVQQVIFIDYLQLMQTDSKLDRRNGIEKISRDLKIIANETGAIIVLLSQ 
LSRGV^lTRITOKRPMLSDMKEAGGIEADASIJyaLLYRDDYYNRDDVDDSGKSIVEC^ 
KDGETGWEFEYYKKTQRFFT* 

40 

Sequence 3057 
Cont ig_0 5 9 4_pos_2 706_2 050 
is similar to (with p-value 2.0e-76) 
>sp:sp|Q59935 |MANA_STRMU MANNOSE- 6 -PHOSPHATE ISOMERASE (EC 

45 5.3.1.8) (PHOSPHOMANNOSE ISOMERASE) (PMI) (PHOSPHOHEXOMUTASE 
). >gp:gp|D16594 I STRPMI_2 S.mutans pmi gene for mannosephosp 
hate isomerase (complete cds) and scrK gene for fructokinase 

(partial cds). NID: g451214. 
atgagtattttagttattggagcaaatggaggcgtaggttctaaactagtaagtcaatta 

50 aatgaagaacacgttgattttacagctggtgtacgtaaggaagatcaagttaaagaatta 
gaaaataaagggattaaagctatattaatagatgtagaaaaaaatagtattaatgattta 
aaaaatatctttacagattatgataaagttatcttttcagttggatctggtggaagcact 
ggagcggataaaacaatcattgttgatttagatggtgctgtaaaaacaattaaagctagt 
aaagaagcggg tatcaaaca t ta tg t tatgg ta tcaacatacgat tc tagacg tgaagca 

55 ttcgatgcgagtggagatttaaaaccgtatacaattgcaaagcattatgctgatgattac 
ttaagaacatcagatcttaattatacaattgtacatccaggttcacttacagatgatgct 
ggaactggaaaaatagaagctgatttatatttcgacaaagcaggatcaattccacgtgaa 
gatgttgctacagttttaaaagaagtagtaacttctgatggttttaataatcaagaattc 
caaattttaagtggcaatcatggtgttaaagatgcattgaaaaactatgaatcttaa 
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Sequence 3058 

MSILVIGANGGVGSKLVSQLNEEHVDFTAGVRKEDQVKELENKGIKAILIDVEKNSINDL 
KNIFTDYDKVIFSVGSGGSTGADKTIIVDLDGAVKTIKASKEAGIKHYVMVSTYDSRREA 
5 FDASGDLKPYTIAKHYADDYLRTSDLNYTIVHPGSLTDDAGTGKIEADLYFDKAGSIPRE 
DVATVLKEWTSDGFNNQEFQILSGNHGVKDALKNYES* 

Sequence 3059 

Con t ig_0 5 9 4_pos_l 4 0 8_4 7 0 

10 is similar to (with p-value 3.0e-28) 

>sp:sp|Q04304|YMY0_YEAST HYPOTHETICAL 24.9 KD PROTEIN IN RC 
A1-NPL6 INTERGENIC REGION. >pir : pir | S544 66 | S54466 hypothetic 
al protein YM9582.15 - yeast (Saccharomyces cerevisiae) >gp: 
gp| Z49259 I SC9582X_15 S. cerevisiae chromosome XIII cosmid 958 

15 2. NID: g807956. 

atgccgttatttttaaaacccatttttctggataaagtatggggcagtgataatcttcgt 
caatttgggtatcaactacctaataatcacataggtgaatgttggggaatttcagcacat 
ccacacggaaaaagtgtgattgaaaatggtatatttgctggtcaaacattggatcaagta 
tggaacaatcatagagaaatatttggagattttccaagtaaagattttccattaatggct 

20 aaaattgtagatgctgctgcgccattgtctattcatgtacatcccgacgattcttatgca 
tatgaacacgaagaaggtcaatatggaaaatctgaatgttggtacatcattgaagctgat 
gaaggtgcaaagattactataggtacgtatgcgaaatctcgtgatgaatttgaagagcaa 
ttggagcaaggtacatttgaaaattatttgagaacaatacaagtgcaaccaggtgatttt 
tactttataccagctggaacgatacattccataggtgcaggcattatggcgtatgaagtc 

25 atgcaatcatcagatatttcatatagaatttatgattatcatagaaaaactgataatagt 
gaggaacgtgaattaaatatagataaggcattagatgttattaattattcaaatgaacta 
cctaatatcactcctcaaaatgaagtgatagaaaatcacaattgtacacatattgtatct 
agtgatttttttactatggttaagtgggatatttctggtactctaaattatatgaagcct 
agagaattttgtcttgtttctgttttagatggacaaggtaaacttattgtagatggtgat 

30 atatatgatatatctaaaggttcaaactttgtgttaacttctgaagatttagatagtgtt 
ttcgaaggagattttaaactaatcattagttacatttaa 

Sequence 3060 

MPLFLKPIFLDKVWGSDNLRQFGYQLPNNHIGECWGISAHPHGKSVIENGIFAGQTLDQV 
35 WNNHREIFGDFPSKDFPLMAKIVDAAAPLSIHVHPDDSYAYEHEEGQYGKSECWYIIEAD 
EGAKITIGTYAKSRDEFEEQLEQGTFENYLRTIQVQPGDFYFIPAGTIHSIGAGIMAYEV 
MQSSDISYRIYDYHRKTDNSEERELNIDKALDVINYSNELPNITPQNEVIENHNCTHIVS 
SDFFTMVKWDISGTLNYMKPREFCLVSVLDGQGKLIVDGDIYDISKGSNFVLTSEDLDSV 
FEGDFKLIISYI* 

40 

Sequence 3061 
Con t ig_0 5 9 5_pos_2 4 9 4_3 402 
is similar to (with p-value 2.0e-60) 
>sp:sp| P44331 |RBSK__HAEIN RIBOKINASE (EC 2.7.1.15). >pir:pir 
45 |B64073 |B64073 ribokinase (rbsK) homolog - Haemophilus influ 
enzae (strain Rd KW20) >gp :gp | U32732 | U32732„6 Haemophilus in 
fluenzae Rd section 47 of 163 of the complete genome. NID: g 
1573480. 

gtgattgtaattggatcaacaaatgtagataaatttcttaatgttaaaaggtttccaaaa 
50 cccggtgagacattacatattaatcaagctcaaaaggagtttggtgggggcaagggagcc 
aatcaagccatagcagctagtagattagcagcagatacaacatttatcagtaaagttggt 
aaagatggcaatgccaactttatattggaagatttcaaaaaagcaggtattcatacacaa 
tatattttaacttcagaaagtgaagaaactgggcaagcatttatcactgttgatga'agca 
ggacaaaatacgattcttgtttacggtggtgcgaatatgacattaagtgcaactgatgtt 
55 gagatgagtgcggatgcgtttattggtgcagactttgttgtagcgcagcttgaagttcca 
tttgaggcgatagaacaagcatttaaaattgctcgtaaacaaaatatcactactgtatta 
aatcctgcaccggcaattgaattgcctaagtcacttttagagttaactgatataattatt 
ccaaacgaaacggaagcagaattattaacaggtatttcaatcaataatgaaagtgatatg 
aaagaaacagcaacatattttctcgatttaggtatatctgcagtattaattactttaggg 
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gagcaaggcacgtattgtgcatatcaagaacaatacaaaatgattcctgcgtgtaatgta 
aaagcaatagatacgacagcagcaggagatacatttataggtgcttttttaagtgagtta 
aataaagaCttgagtaatatagaatcggctattcgacttgcaaatcaagcgtcgtctcta 
acggtacaacgaaaaggagcacaagcttctataccaacacgtaaagaagtagaggcagaa 
5 tataattaa 



Sequence 3062 

VIVIGSTNVDKFLNVKRFPKPGETLHINQAQKEFGGGKGANQAIAASRLAADTTFISKVG 
KTCNANFILEDFKKAGIHTQYILTSESEETGQAFITVDEAGQNTILVYGGANMTLSATDV 
10 EMSADAFIGADFWAQLEVPFEAIEQAFKIARKQNITTVLNPAPAIELPKSLLELTDIII 
PNETEAELLTGISINNESDMKETATYFLDLGISAVLITLGEQGTYCAYQEQYKMIPACNV 
KAIDTTAAGDTFIGAFLSELNKDLSNIESAIRLANQASSLTVQRKGAQASIPTRKEVEAE 
YN* 

15 Sequence 3063 

Contig_0596_pos_4550_3993 

is similar to (with p-value 3.0e-45) 
>sp:sp|P20368 |ADH1_ZYMM0 ALCOHOL DEHYDROGENASE I {EC 1.1.1. 

1) (ADH I), >pir :pir I A35260 I A35260 alcohol dehydrogenase (EC 
20 1.1.1.1) I - Zymomonas mobilis >gp: gp |M32100 | ZM0ADHA_1 Z.mo 

bills alcohol dehydrogenase I (adhA) gene, complete cds. NID 

: gl55570. 

gtgagtggtatcgaaccaggacaatggttaggcgtatttggtgtgggaggattaggtaat 
ttagcattgcaatacgccaaaaacgtaatgggcgcgaaagtcgttgcatttgacattaat 

25 gatgataaattaaattttgctaaagagcttggtgctgatgcaatcataaattcaactaat 
gttgatcctattgaggaagttaatcgtctaacgaataataaaggattagatgcaacggtg 
attactgctgtagctaaaacaccttttaatcaagcagttgatgttgttaaggcgggtgca 
cgtgtggtagcagtaggacttccagtagataaaatggatttagatattccacgtttagtg 
cttgatggaattgaagtcgttggttcattagttggtaccagacaagatttaagagaagca 

30 tttcaatttgctgccgaaaataaagttattcctaaaatccaattaagacaattatctgaa 
attaacgatatttttgatgaaatggaaaaaggaacaattacgggtcgaatggtaattgat 
atgaaaagcacgcactga 

Sequence 3064 

35 VSGIEPGQWLGVFGVGGLGNLALQYAKNVMGAKWAFDINDDKLNFAKELGADAIINSTN 
VDPIEEVNRLTNNKGLDATVITAVAKTPFNQAVDWKAGARWAVGLPVDKMDLDIPRLV 
LDGIEWGSLVGTRQDLREAFQFAAENKVIPKIQLRQLSEINDIFDEMEKGTITGRMVID 
MKSTH* 



40 Sequence 3065 

Contig_0598_pos_4948_5679 
is similar to {with p-value 3.0e-65) 
>sp:sp|Pl8843 |NADE_EC0LI NH (3 ) -DEPENDENT NAD{ + ) SYNTHETASE 
(EC 6.3.5-1) (NITROGEN- REGULATORY PROTEIN). >gp : gp | D90817 | D9 

45 0817_9 E.coli genomic DNA, Kohara clone #326(39.1-39.4 min.) 
. NID: gl742837. >gp : gp | D9 0818 | D9 081 8_4 E.coli genomic DNA, 
Kohara clone #327(39.2-39.5 mih.). NID: gl742849. >gp:gp|AE0 
00269 |AE000269_3 Escherichia coli K-12 MG1655 section 159'of 
400 of the complete genome. NID: gl788033. 

50 atgctaatagataaagcaagatcatttattcagaccatgtatagcgaattaaaatataat 
actaatgaaattgaaaatagaatgaaagagattgagcaagaaattaacttgactggtagt 
tacacacatacttatgaagaattatcttacggtgcaaaaatggcatggagaaactcaaat 
cgttgtattggtagactgttttggaattctttaaatgttaaagatgcccgagatgtatgt 
gacgaaaaagaatttataaaatttatacatacacatattaaagaagctactaacggcgga 

55 aaaatcaaaccatatattacaatttttagtcctgaagatacacctaaaatttataataat 
cagttgattcgttatgctggttatgaaaatgttggcgatccatctgaaaaaaaggttact 
cgtttagctgaacatctaggttggaaaggtaaaggttcaaattttgatattttacctctg 
atttatcaattgcctaacgacactataaaaatacacgaacttccaaatgatattgtaaaa 
gaagtttctatacatcatgaacactatcccaagctttcaaaattaggtttaaaatggtat 
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gcggtacc tat tatttcaaatatggatttaaaaatcggtggtat tact taccctacagca 
ccttttaatggatggtatatggtaaccgaaattgctgtacgtaatttcacagacacctat 
cgtcttatgtaa 

5 Sequence 3 066 

MLIDKARSFIQTMYSELKYNTNEIENRMKEIEQEINLTGSYTHTYEELSYGAKMAWRNSN 
RCIGRLFWNSLNVKDARDVCDEKEFIKFIHTHIKEATNGGKIKPYITIFSPEDTPKIYNN 
QLIRYAGYENVGDPSEKKVTRLAEHLGWKGKGSNFDILPLIYQLPNDTIKIHELPNDIVK 
EVSIHHEHYPKLSKLGLKWYAVPIISNMDLKIGGITYPTAPFNGWYMVTEIAVRNFTDTY 
10 RLM* 

Sequence 3 067 
Con t ig_0 5 9 8_pos_4 7 5 6_3 287 
unknown 

15 gtgtaccaatataacgacgatagcttaatgttacacaatgatttatatcaaattaatatg 
gctgaaagctactggaatgatggtatccatgaaagaatagcagtgtttgatttgtatttt 
cgaaaaatgccatttaatagtggatatgcggtattcaacggattgaaacgcgttgtgaat 
ttcatcgaaaactttgggtttacaaatgaagatatcacatatttaaaatcgataggttat 
gaagaagattttctaaattacctaaaagatttgaaatttacagggaatattaaatctatg 

20 caagaaggtgaaatttgttttggtaatgagccattattaagagttgaagcacctttaatc 
caagcgcaacttattgaaactattttgttaaatatcattaatttccaaacattaattgca 
actaaagccagccgaattcgtcaaatagcaactcatgacactttgatggaatttggtaca 
agaagagctcaagagatcgatgctgcactgtggggcgctagagcagcctttattggaggg 
tttgattctacaagtaatgttagagcaggaaaactttttaatatacctgtatctggcaca 

25 catgcacacgcactagtacaaacatatggtgatgagtatatagcattcaaaaagtatgct 
gagcgacataaaaattgtgtgttcttagttgatacttttcatactttaaaatcaggagta 
ccaaccgcaattaaggttgcaaaagagttaggagatactattaattttataggtatcaga 
ttagattctggtgatattgcgtacctatctaaagaagctcgtagaatgttagatgaggct 
ggttttacagaagctaaaattatcgcatcaaatgatttggatgagcagactattacaagt 

30 ttaaaagcacaaggcgctaaagttgacggatggggagtaggtacaaaactgattacagga 
tatgatcaaccagccttaggtgcagtttataaattggtttctattgaaacagatgatggc 
acaatgagtgatcgcattaaattatcaaataatgctgagaaagttactacaccaggcaaa 
aaaaatgtttatcgtattattaataataaaacaggcaaggctgagggcgactatattacg 
ctagaaggtgaaaatcctaatgacgaatctccattgaaaatgttccatcctgttcacact 

35 tacaaaatgaagtttat taaa teat ttaaagcggttaat ctacatcaatcta tat ttgaa 
aatggcaaacttgtataccatcttccagatgaatatgaagctcaggactatcttaaaaat 
aatttaagtattttatgggaagaaaataaacgatatcttaacccgcaagattatccagta 
ga 1 1 taagcac taaa tg t tgggaaaa taagca taagcg ta 1 1 1 1 1 gaag t tgc tgaacac 
g 1 1 aaagaga t ggagga t gaaaa tgag tag 

40 

Sequence 3068 

VYQYNDDSLMLHNDLYQINMAESYWNDGIHERIAVFDLYFRKMPFNSGYAVFNGLKRWN 
FIENFGFTNEDITYLKSIGYEEDFLNYLKDLKFTGNIKSMQEGEICFGNEPLLRVEAPLI 
QAQLIETILLNIINFQTLIATKASRIRQIATHDTLMEFGTRRAQEIDAALWGARAAFIGG 

45 FDSTSNVRAGKLFNIPVSGTHAHALVQTYGDEYIAFKKYAERHKNCVFLVDTFHTLKSGV 
PTAIKVAKELGDTINFIGIRLDSGDIAYLSKEARRMLDEAGFTEAKIIASNDLDEQTITS 
LKAQGAKVTCWGVGTKLITGYDQPALGAVYKLVSIETDDGTMSDRIKLSNNAEKVTTPGK 
KNVYRIINNKTGKAEGDYITLEGENPNDESPLKMFHPVHTYKMKFIKSFKAVNLHQSIFE 
NGKLVYHLPDEYEAQDYLKNNLSILWEENKRYLNPQDYPVDLSTKCWENKHKRIFEVAEH 

50 VKEMEDENE* 

Sequence 3069 
Con t ig_0 5 9 8_pos_3 0 5 7_2 467 
is similar to (with p-value 5.0e-55) 
55 >pir:pir |A47501|A47501 nitric-oxide synthase (EC 1.14.13.39 
endothelial - human >gp:gp|M93718 |HUMNI0XSYN_1 Human nitr 
ic oxide synthase mRNA, complete cds . NID: gl89211. >gp:gp|L 
10709 |HUMNIT0X17_1 Human constitutive endothelial nitric oxi 
de synthase gene, exons 25 and 26 and complete cds. NID: g3 4 
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8235. >gp:gp|M95296 I HUMN0S_1 Human nitric oxide synthase mRN 
A, complete cds. NID: gl89259. >gp:gp| L26914 |HUMN0SA_1 Human 

nitric oxide synthase mRNA, complete cds. NID: g434699. 
gtgaaattaccttatggtgtgcaacaagacgctcatgaagtagaagatgcacttgagttt 
5 attaatcctgacacaacatatacagttaatattaaaccagcagttgatcagagtgttcaa 
tcacttagtgaagcaggcatcaaacttactgattttcaaaagggtaatgaaaaagcacgt 
gaacgaatgaaagttcaattttcaattgcttctaatactcaaggtatagttttaggaact 
gatcactctgccgaaaatattacaggattttacactaaatatggagatggtgctgcggac 
attgcgcctatctttgggttaaataaaagacaaggtaaacaattactagcttatctagga 
10 gcacctaaacacctttatgaaaaagtgccaacagctgatttagaagatgataaacctcag 
ttaccagacgaggaagcactaggcgtatcttatcatgatattgatgattatttagaaggt 
aaagaaattcctgcaactgctcgtgaaacaatcgaaaaacattatgttagaaatgcacat 
aagcgtgaacttgcttatacacgatattcatggcctaaatataacaaatga 

Sequence 3070 

VKLPYGVQQDAHEVEDALEFINPDTTYTVNIKPAVDQSVQSLSEAGIKLTDFQKGNEKAR 
ERMKVQFSIASNTQGIVLGTDHSAENITGFYTKYGDGAADIAPIFGLNKRQGKQLLAYLG 
APKHLYEKVPTADLEDDKPQLPDEEALGVSYHDIDDYLEGKEIPATARETIEKHYVRNAH 
KRELAYTRYSWPKYNK* 

Sequence 3071 
Cent i g_0 6 0 5_pos_6 3 6 7_0 
is similar to (with p-value 2.0e-46) 

>sp:sp|Q43157 |RPE_SPI0L RIBULOSE- PHOSPHATE 3-EPIMERASE PREC 
URSOR (EC 5.1.3.1) (PBNTOSE-5- PHOSPHATE 3-EPIMERASE) (PPE) 
(RPE) . >gp:gp|AF070941 1 AF070941_1 Spinacia oleracea ribulose 
-phosphate 3-epimerase (RPE) mRNA, nuclear gene encoding chl 
oroplast protein, complete cds. NID: g3264787. >gp : gp | L423 28 
|SPIR5P3E_1 Spinacia oleracea nuclear- encoded chloroplast ri 
bulose-5-phosphate 3-epimerase mRNA, complete cds. NID: gll6 
2979. 

atggttaaaattttaccatcacttttatctatagattttttaaatttaaaagaagagctt 
caattgttagaaacagcaaaggtagacggattacactttgatgtaatggacggtaaattt 
gtccctaatatttcaatcggtattccgattttggatgctgttagacaacaatctcatttg 
ccaatagatgttcatttaatgattgagcaacctgaaaattatattaatctttttgccgaa 
catggtgctgatatgatttctgttcatgttgagtcgacaacacatatacatagagcaatt 
gaacaaattaaacaattagggaaaaaagcaggtgtcgtcatcaatcctggaacatctgta 
gaaacaattttacctatattgagtattgttgattatgttctagtaatgactgtaaatcct 
ggttttggtggacaaacattcatagaacaatgcgtgactaagattgagcaattaaatcaa 
cttaaacatgaaaatcatttaacttttgatattgaggtagatggaggcattaacgatcaa 
acgagtaaacgatgtgtaga 

Sequence 3072 

MVKILPSLLSIDFLNLKEELQLLETAKVDGLHFDVMDGKFVPNISIGIPILDAVRQQSHL 
45 PIDVHLMIEQPENYINLFAEHGADMISVHVESTTHIHRAIEQIKQLGKKAGWINPGTSV 
ETILPILSIVDYVLVMTVNPGFGGQTFIEQCVTKIEQLNQLKHENHLTFDIEVDGGINDQ 
TSKRCVX 

Sequence 3 073 
50 Contig_0607_pos_1070_726 

is similar to (with p-value 1.0e~60) 
>pir:pir 1 167760 1 167760 transposase (insertion sequence ISIO 

) - Escherichia coli >gp:gp|S67119 | S67119_2 BST=soma to tropin 

. . . BST/beta-Gal fusion protein [Escherichia coli, LBB84, pla 
55 smid pXT107, ISlOL/R-1, PlasmidTransposonlnsertionMutant , 3 

genes, 1679 nt] . NID: g455674. 

atgcagattgaagaaaccttccgagacttgaaaagtcctgcctacggactaggcctacgc 
catagccgaacgagcagctcagagcgttttgatatcatgctgctaatcgccctgatgctt 
caactaacatgttggcttgcgggcgttcatgctcagaaacaaggttgggacaagcacttc 
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caggctaacacagtcagaaatcgaaacgtactctcaacagttcgcttaggcatggaagtt 
ttgcggcattctggctacacaataacaagggaagacttactcgtggctgcaaccctacta 
gctcaaaatttattcacacatggttacgctttggggaaattatga 

5 Sequence 3 074 

MQIEETFRDLKSPAYGLGLRHSRTSSSERFDIMLLIALMLQLTCWLAGVHAQKQGWDKHF 
QANTVRNRNVLSTVRLGMEVLRHSGYTITREDLLVAATLLAQNLFTHGYALGKL* 

Sequence 3075 

10 Contig_0608_pos_9823_8729 

is similar to {with p-value 3.0e-52) 

>sp:sp|Q43157 |RPE_SPIOL RIBULOSE- PHOSPHATE 3-EPIMERASE PREC 
URSOR (EC 5.1.3.1) (PENTOSE-5- PHOSPHATE 3-EPIMERASE> (PPE) 
(RPE) . >gp:gp I AF070941 I AF070941_1 Spinacia oleracea ribulose 

15 -phosphate 3-epimerase (RPE) itiRNA, nuclear gene encoding chl 
oroplast protein, complete cds. NID: g3264787. >gp :gp | L42328 
|SPIR5P3E_1 Spinacia oleracea nuclear-encoded chloroplast ri 
bulose-5-phosphate 3-epimerase mRNA, complete cds. NID: gll6 
2979. 

20 atgataactgcagaaaaaaagaagaaaaacaaattcttacctaatttcgaaaaacaatcg 
atctactccttaagatatgacgagatgcaacaatggcttattgatcacggacaacaaaaa 
ttcagagcaaaacaaatttttgaatggttataccaaaagcgtgtgaatactattgatgaa 
atgactaacctgtctaaagagttacgtcaaattctcaaagatcattttgcaatgacgaca 
ttgaccactgttgttaaacaagaaagtaaagatggaacaattaagttcttatttgaatta 

25 caagatggttatactattgaaactgttttaatgagacatgaatatggaaattctgtctgt 
gtaacaacacaagtaggatgtagaattggttgtacgttttgtgcttccactttgggcgga 
ttaaagcgtaatttagaggccggagagattgtctctcaagtattaactgtacaaaaggca 
ctagacgaaacgaatgaacgtgtatcacaaattgtcattatgggcataggtgaacctttc 
gagaattatgatgaaatgatggatttcttaagaattgttaatgatgataacagtttaaat 

30 attggtgcacgtcatattactgtatctacttcaggaattattccaagaatttatgatttt 
gccgaagaagatatacaaataaattttgctgtgagtcttcatggtgctaaagacgaaata 
agatcaagattaatgcctatcaatcgtgcttataacgttgataagttaatggaagctatt 
cgttattatcaagaaaagacaaatcgccgtgttacttttgaatatggattgtttggtggt 
gttaatgaccaacttgaacatgcgagagatttggcacatttaattaagaatctcaattgc 

35 cacgttaatttaataccagttaaccatgtcccagaaagaaattatgtaaagacaccaaaa 
gatgatatctttaaattcgagaaggaattaaagagattaggaattaatgctacaattaga 
cgtgagcaagggtcagatattgatgctgcgtgtggacaattaagagcgaaggaacgacaa 
gtagaaacgaggtaa 

40 Sequence 3076 

MITAEKKKKNKFLPNFEKQSIYSLRYDEMQQWLIDHGQQKFRAKQIFEWLYQKRVNTIDE 
MTNLSKELRQILKDHFAMTTLTTWKQESKDGTIKFLFELQDGYTIETVLMRHEYGNSVC 
VTTQVGCRIGCTFCASTLGGLKRNLEAGEIVSQVLTVQKALDETNERVSQIVIMGIGEPF 
ENYDEMMDFLRIVNDDNSLNIGARHITVSTSGIIPRIYDFAEEDIQINFAVSLHGAKDEI 

45 RSRLMPINRAYNVPKLMEAIRYYQEKTNRRVTFEYGLFGGVNDQLEHARDLAHLIKNLNC 
HVNLIPVNHVPERNYVKTPKDDIFKFEKELKRLGINATIRREQGSDIDAACGQLRAKERQ 
VETR* 

Sequence 3077 

50 Con t i g_0 6 0 8_pos_4 7 9 4_4 123 

is similar to (with p-value 7.0e-63) 

>sp:sp|P36979 |YFGB_EC0LI HYPOTHETICAL 43 . 1 KD PROTEIN IN ND 
K-GCPE INTERGENIC REGION. >gp: gp|D90881 |D90881_3 E.coli geno 
mic DNA, Kohara clone #428(56.8-57.0 min. ) . NID: gl799913. > 

55 gp:gp|D90882 |d908B2_2 E.coli genomic DNA, Kohara clone #429( 
56.9-57.2 min.). NID: gl799919. >gp : gp | U02965 | ECU02965_1 Esc 
herichia coli K12 ORF384 gene, complete cds, and ORF337 gene 
, partial cds. NID: g493518. >gp:gp | AE000338 | AE000338_3 Esch 
erichia coli K-12 MG1655 section 228 of 400 of the complete 
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genome. NID: gl788862. 

atggttaaaattttaccatcacttttatctatagattttttaaatttaaaagaagagctt 
caattgttagaaacagcaaaggtagacggattacactttgatgtaatggacggtaaattt 
gtccctaatatttcaatcggtattccgattttggatgctgttagacaacaatctcatttg 
5 ccaatagatgttcatttaatgattgagcaacctgaaaattatattaatctttttgccgaa 
catggtgctgatatgatttctgttcatgttgagtcgacaacacatatacatagagcaatt 
gaacaaattaaacaattagggaaaaaagcaggtgtcgtcatcaatcctggaacatctgta 
gaaacaattttacctatattgagtattgttgattatgttctagtaatgactgtaaatcct 
ggttttggtggacaaacattcatagaacaatgcgtgactaagattgagcaattaaatcaa 
10 cttaaacatgaaaatcatttaacttttgatattgaggtagatggaggcattaacgatcaa 
acgagtaaacgatgtgtagaacagggtgctacaatgttagtcactggttcatacttcttt 
aaacaagaggattatgcaaaagtaactagaaactttatgtcatattacagttttatcata 
tatcatttatag 

15 Sequence 3078 

MVKILPSLLSIDFLNLKEELQLLETAKVDGLHFDVMDGKFVPNISIGIPILDAVRQQSHL 
PIDVHLMIEQPENYINLFAEHGADMISVHVESTTHIHRAIEQIKQLGKKAGWINPGTSV 
ETILPILSIVDYVLVMTVNPGFGGQTFIEQCVTKIEQLNQLKHENHLTFDIEVDGGINDQ 
TSKRCVEQGATMLVTGSYFFKQEDYAKVTRNFMSYYSFIIYHL* 

20 

Sequence 3079 

Contig_0 512_pos_14901_13 966 
is similar to (with p-value l-Oe-57) 
>gp:gp|AB009866|AB009866_17 Bacteriophage phi PVL proviral 

25 DNA, complete sequence. NID: g3341907. 

atgcacttatctaaaatattaaaacaaggtaaaatcaaagctggagaaccaatggctaaa 
acaggtaattcaggtcaatggactactggtccacacgtacacttccaagttgaaagaggt 
cgccatgatgacatcacaaacagagggacagtaaaccctgctaaatggctcaaaggtcac 
ggtggtggaaaagttggtggtagtggttctgtaaacgcacgtagagcaattcaaagagca 

30 caatctattttaggtggacgttataaatcgtcttatattaccgaacaaatgatgagagtt 
gccaaacgtgagtctaacttccaatcagatgcggttaataactgggacatcaacgcacaa 
aaaggaacgccttctaaaggtatgttccaaatgattgaaccatcttttagagcatatgct 
aaaccaggacacggaaacatcttaaatccaactgacgaagctatatctgctatgcgttac 
attgtaggtaagtgggttcctattatggggagttggagaagtgcatttaaacgtgctgga 

35 gattatgcttatgctacaggcggggttattaacactgctggattatataatttggcagaa 
gatggataccctgagatagtaatccctacagatccaagcagacaatcagatgcgatgaaa 
ttgttacatcttgctgcaagtaaaattagtggaaataacagaaataaacgacctaaccaa 
ttacgtacacctaatgttactagtaatacagttgataatgcagaattactactacaaatg 
atagaaaatcaacagaaacaaataaacgtgttaatggaaatagcacgaagtaataaaact 

40 attgaaaaacaaccgaaaggtttttcagaacgcgatgtaagtcaggcacaaggttcaagg 
ttaagactcgctgcttatagccagggaggtttataa 

Sequence 3 080 

MHLSKILKQGKIKAGEPMAKTGNSGQWTTGPHVHFQVERGRHDDITNRGTVNPAKWLKGH 
45 GGGKVGGSGSWARI^IQRAQSILGGRYKSSYITEQMMRVAKRESNFQSDAVNNWDINAQ 
KGTPSKGMFQMIEPSFRAYAKPGHGNTLNPTDEAISAMRYIVGKWVPIMGSWRSAFKRAG 
DYAYATGGVINTAGL YNLAEDGYPEI VI PTDPSRQSDAMKLLHLAASKI SGNNRNKRPNQ 
LRTPNVTSNTVDNAELLLQMIENQQKQINVLMEIARSNKTIEKQPKGPSERDVSQAQGSR 
LRLAAYSQGGL* 

50 

Sequence 3 081 
Contig_0612_pos_6018_3 844 

>pir:pir |S01788|S01788 formate C- acetyl transferase (EC 2.3. 
1.54) - Escherichia coli >gp:gp|D90728 |D90728_3 Escherichia 
55 coli genomic DNA. (20.3 - 20.7 min) . NID: gl651424. >gp:gp|X 
08035 1 ECPFL_1 E. coli pfl gene for pyruvate f ormate-lyase (E 
C 2.3.1.54). NID: g42369. >gp: gp | AEO0O192 | AE000192_6 Escheri 
chia coli K-12 MG1655 section 82 of 400 of the complete geno 
me- NID: gl787125. 
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gtgagagaattcatacaattgaactattcattatatgaaggtgacgatgaatttttagaa 
ggtcctacaaaagcaactgaaactttatgggatcaagtaatgcaattatcaaaagaagaa 
cgtgagcgcggtggcatgtgggacatggatactaaagtggcatcgacaatcacttctcat 
gacgctggttacttagacaaagatttagaaaaagttgttggtgttcaaactgaaaaacca 
5 ttcaaacgttctatgcaaccattcggtggtattcgtatggcaaaagcagcatgtgaagcg 
tatggttacgaattagatccagaaacagaaaaaatcttcactgaatatcgtaaaacacac 
aaccaaggtgtattcgatgcatattcaagagaaatgttaaactgtcgtaaagctggtatt 
attactggtttgccagatgcttacggacgtggacgtattatcggagactatcgtcgtgtt 
gctttatacggtgtagatttcttaatggaacaaaaacttaaagactttaacacaatgtct 

10 actgaaatgtctgaagatgtaattcgtttacgtgaagaattatcagagcaatatcgttca 
cttcaagatttaaaagaattaggtcaaaaatatggatttgatattagccgtcctgctact 
aacttcaaagaagctgtgcaatggttatacttagcatattVagctgctatcaaagaacaa 
aatggtgcagcaatgagtttaggacgtacttcaacattcttagatatttatgctgaacgt 
gatttacaaaatggtgacatcactgaacaagaagttcaagaaatcattgaccacttcatt 

15 atgaaattgcgtatcgttaaattcgcgcgtacgcctgaatataatgaattattctctgga 
gatccaacttgggtaactgaatctatcggtggtgtaggtattgacggacgtccaatggta 
actaaaaactcattccgtttcttacactcattagataatttaggtccagcaccagaacca 
aacttaacagtgttatggtctactcgcttacctgaaaacttcaaaatctattgtgctaaa 
atgagtattaaaacgagctcaatccaatatgaaaatgatgacttaatgcgtgaaagctat 

20 ggcgatgattatggtatcgcttgctgtgtatctgccatgaagattggtaaacaaatgcaa 
ttcttcggtgcacgtgctaacttagctaaagcattactttacgctatcaatggtggtaaa 
gatgaaaaatctggtaaacaagttgggccaagttatgaaggtattaaatcagacgtacta 
gattatgatgaagtcttcgaaagatatgaaaaaatgatggactggttagctggcgtatat 
atcaactcattaaatatcattcactatatgcatgataaatatagctatgaacgtcttgaa 

25 atggctttacatgatacagaaattattcgcacaatggcaactggtattgccggattgtct 
gtagcagctgactctttatcagcgattaaatatgcacaagttaaacctatccgtaacgaa 
gaaggtcttgtaactgactttaaaatcgaaggcgacttccctaaatatggtaataatgac 
agtcgtgttgatgaaattgcagtagatttagttgaacgtttcatgactaaattacgtagc 
cataaaacataccgtaattctgaacacacaatgagtgtattaacaattacttcaaacgtt 

30 gtttatggtaagaaaactggtaacacaccagatggacgtaaagctggcgaaccatttgca 
cctggcgcaaacccaatgcatggtcgtgaccaaaaaggtgcattatcttcactaagttca 
gtagctaaaataccttacgattgctgtaaagatggtatctcaaatacatttagtatcgta 
ccgaaatcactaggtaaagaagaagcagatcaaaataaaaacttaactagtatgttagat 
ggttatgcaatgcaacatggtcatcacctcaacattaacgtatttaatagagaaacatta 

35 attgatgcaatggaacatccagaagagtatccacaattaacgattcgtgtatctggatac 
gctgtaaacttcattaaattaacacgtgaacaacaattagatgttatttcacgtacattc 
cacgaatctatgtaa 

Sequence 3082 

40 VREFIQLNYSLYEGDDEFLEGPTKATETLWDQVMQLSKEERERGGMWDMDTKVASTITSH 
DAGYLDKDLEKWGVQTEKPFKRSMQPFGGIRMAKAACEAYGYELDPETEKIFTEYRKTH 
NQGVFDAYSREMLNCRKAGIITGLPDAYGRGRIIGDYRRVALYGVDFLMEQKLKDFNTMS 
TEMSEDVIRLREELSEQYRSLQDLKELGQKYGFDISRPATNFKEAVQWLYLAYLAAIKEQ 
NGAAMSLGRTSTFLDIYAERDLQNGDITEQEVQEIIDHFIMKLRIVKFARTPEYNELFSG 

45 DPTWVTESIGGVGIDGRPMVTKNSFRFLHSLDNLGPAPEPNLTVLWSTRLPENFKIYCAK 
MS I KTS S I QYENDDLMRES YGDDYG I ACC VS AMKIGKQMQFFGARANLAKALLYAINGGK 
DEKSGKQVGPSYEGIKSDVLDYDEVFERYEKMMDWLAGVYINSLNIIHYMHDKYSYERLE 
MALHDTEI IRTMATG lAGLSVAADSLSAIKYAQVKPIRNEEGLVTDFKIEGDFPKYGNND 
SRVDEIAVDLVERFMTKLRSHKTYRNSEHTMSVLTITSNWYGKKTGNTPDGRKAGEPFA 

50 PGANPMHGRDQKGALSSLSSVAKIPYDCCKDGISNTFSIVPKSLGKEEADQNKNLTSMLD 
GYAMQHGHHLNINVFNRETLIDAMEHPEEYPQLTIRVSGYAVNFIKLTREQQLDVISRTF 
HESM* 

Sequence 3083 
55 Contig_0613_pos„918_1343 

is similar to (with p-value l.Oe-32) 

>gp:gp|AB009866| AB009866_61 Bacteriophage phi PVL proviral 
DNA, complete sequence. NID: g3341907. 

atgacaaatacattagaaattaaattattatcagaaaacgcgactatgccgaagagagca 
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aattctacagatagtggattggacttatacgtatcagaaacgattaacattcctgcacac 
gcaactaaagtagttaaaacagatatagcgattaatctgccttatgggtatgaggcgcaa 
gtaagacctagatctggtaaatcacttaaaactaaattgcgtgtagcactaggaacaata 
gaccaaacataccacaaagaaataggtatcatcacagataacataggtaatgaagatatc 
5 acagtagaaaaaggagaaagattagcgcaattagttgtagcgccagttgtatatcctaca 
cccaaacaggttgattggtttgaaaatgaaagcgacagaggtgcatatggaagcacagga 
gaataa 

Sequence 3084 

10 MTNTLEIKLLSENATMPKRANSTDSGLDLYVSETINIPAHATKWKTDIAINLPYGYEAQ 
VRPRSGKSLKTKLRVALGTIDQTYHKEIGIITDNIGNEDITVEKGERLAQLWAPWYPT 
PKQVDWFENESDRGAYGSTGE * 

Sequence 3 085 
15 Contig_0613_pos_2864_3259 

is similar to (with p-value 2.0e-21) 

>gp:gp| AB009866|AB009866_1 Bacteriophage phi PVL proviral D 
NA, complete sequence. NID: g3341907. 

gtgaaagaaatgatatataactataagtggatgaccaatattataacatcacaaatttac 
20 gatgcagacagtacatctattgcacaatatggtatagaatctgttatgcctaaggctaaa 
ggacaagcaggcgataaagtgtttttgaaggttatgaatagaaataaagcgtggagacgt 
aatttaaaactcatagagaagattgagtttattgacaagtatgaggagtacataactaac 
gagatgaactttcatatattgcagatgattaagttaggtacgaaacttaaaactgttatg 
gacttaatggaaataaaaagtaaatcaacattctatggttgtttaaatgaaatagtaaac 
25 gtatatatggatgcacagaaaggatattacgattag 

Sequence 3 086 

VKEMIYNYKWMTNIITSQIYDADSTSIAQYGIESVMPKAKGQAGDKVFLKVMNRNKAWRR 
NLKLIEKIEFIDKYEEYITNEMNFHILQMIKLGTKLKTVMDLMEIKSKSTFYGCLNEIVN 
30 VYMDAQKGYYD* 

Sequence 3087 
Contig_0613_pos_4377_4844 
>gp:gp|X97563 |BPHA3GP3_5 Bacteriophage A2 gp3 gene and 4 op 

35 en reading frames. NID: gl523807. 

atgccgccaagaaaattattatctcaacaaaaaggtaatttaacagtcgaacaacaagaa 
aataaagaaaatgcagaaaaagcgatggcgcaactcactgagatagatgaaaaacctcct 
gaa tggc t tga taaaga tgcaataaaagaat ggcatcgcatat tacct t tgat tcaagaa 
ctaccaatagcagctttagatatggggttattagccacctattgtcaaacatatagcaat 

40 tacaagaacgccacgatccaattagaaaaagagggtatgatcgtcgaaaccgaaagagga 
acgaaattatctagttattacacagtacaaagagatagcgtaaatgctatgaactccatt 
tgtcctaaattaggattgacagttgagtcacgtttaaaaatattgtcgccagatacaaag 
aaagaaaagaaagatgaatttgaggacttaatgaatggcaaagattag 

45 Sequence 3088 

MPPRFCLL SQQKGNLTVEQQENKENAEKAMAQLTE IDEKP PEWLDKDA I KEWHR I LPL IQE 
LPIAALDMGLLATYCQTYSNYKNATIQLEKEGMIVETERGTKLSSYYTVQRDSVNAMNSI 
CPKLGLTVESRLKILSPDTKKEKKDEFEDLMNGKD* 

50 Sequence 3089 

Contig_0613_^os_4909_6483 

is similar to (with p-value 6.0e-23) 

>sp:sp|P14597|DUT_ORFN2 DEOXYURIDINE 5 ' -TRIPHOSPHATE NUCLEO 
TIDOHYDROLASE (EC 3.6.1.23) (DUTPASE) (DUTP PYROPHOSPHATASE) 

55 . >gp:gp|M30023 |ORFPRTPS_l Orf virus homologue of retroviral 
pseudoprotease gene, complete cds . NID: g332561. 
gtgaaagcctgtcaacgccatttagatgacttgaacgattcggaactcccttatcacttt 
gatgtaaagaaagctaatcacattattaagtttcttgaaatgttgccagatcctaaaact 
ggtaaacaattatcgttaggcggttttcaaaaattcattgctggtagcttaaatggttgg 
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tacgacagacatgggtacaaaagatttacaaaagcctatatatcaatgagcagaaaaaat 
ggtaaaacattattgatctctggaatggcattgtacgatttattgatgggtaaagatccg 
ttgaatgaacggttgattggtttgagcgccaattcaagagaccaagctggtatagcatac 
gatatgacattggcacaactgaaagctattagaagcgtttctcctaaggttaaatcgatg 
5 actaagataacgccaagtgcaaaagaaatattgaatattaatgatcgaagtaaagttaaa 
gccgtttcaaatgaagctgcaaatttagaaggtcatcagtttagctacgcaatcatcgat 
gaatatcatgaagctaaagataaaaagatttatgaaacgttaagacgtgggcaagtgcta 
ccgcacaaccctatattaattattatctcaacagctggaactaatttgaatggtccgatg 
tatgaagaatatttatatattgataagatacttgacggcatagcaaaaaatgaaaactac 

10 tttgttttctgtgctgaacaagatgatgagaaagaagtatatgacgttaaaacttggatt 
aaatccaatccacttatggagttgccagaaatggcacaattgttaactaagaatattcaa 
ccagaagttaaaactgcaattgatagtggttcaggattaaatgggatattaataaagaat 
ttcaatatgtggcgtgcagcaagcacagaatcttatttagatttcaatgattggaagaaa 
aatgaaatagactttgatataaatggctctaaaacttatatcggtttagacttatcgcgt 

15 gctgacgacttaaccgcagtatcgtttgttcatcttgatgaagataatcaagagtattat 
gtaactagtcattcgtttgtggctactaaaggtggattagatggcaagattgatagagac 
tttattgattacagacaacttgcagaaagtggttattgtacgattaccgatttacaaagt 
ggaattatcaatactgaccaagttttaaattacattgagaattatatcgaccaatataaa 
ttagacgtacaagcgttatgttatgatccttactcaatacatggtgttattgcagaaatt 

20 gagcgtagagattggccttatgatttagtagaaatcagacaagggccacaaacactatct 
aatccgatactggattttagactgaaagtgattaatggggacatcaagcatcataaaaat 
ccgttactagacattgcagtcaaaaatgctgtggcaaaagataccaatgactcattaatg 
attgaaaagaagatgaaccgagaaaaaatagatccactcatggctaccatatttgcttat 
gttataaatagctga 

25 

Sequence 3090 

VKACQI^LDDLNDSELPYHFDVKKANHIIKFLEMLPDPKTGKQLSLGGFQKFIAGSLNGW 
YDRHGYKRFTKAYISMSRKNGKTLLISGMALYDLLMGKDPLNERLIGLSANSRDQAGIAY 
DMTLAQLKAIRSVSPKVKSMTKITPSAKEILNINDRSKVKAVSNEAANLEGHQFSYAIID 

30 EYHEAKDKKIYETLRRGQVLLHNPILIIISTAGTNLNGPMYEEYLYIDKILDGIAKNENY 
FVFCAEQDDEKEVYDVKTWIKSNPLMELPEMAQLLTKNIQPEVKTAIDSGSGLNGILIKN 
FNMWRAASTESYLDFNDWKKNEIDFDINGSKTYIGIiDLSRADDLTAVSFVHLDEDNQEYY 
VTSHSFVATKGGLDGKIDRDPIDYRQLAESGYCTITDLQSGIINTDQVLNYIENYIDQYK 
LDVQALCYDPYSIHGVIAEIERRDWPYDLVEIRQGPQTLSNPILDFRLKVINGDIJCHHKN 

35 PLLDIAVKNAVAIODTNDSLMIEKKMNREKIDPLMATIFAYVINS* 

Sequence 3 091 
Cont ig_0 6 1 9_pos_4 8 9 9_62 5 7 
>sp: sp I 069282 |MQ0_C0RGL MALATE : QUI NONE OXIDOREDUCTASE (EC 1 

40 .1.99.16) ( MALATE DEHYDROGENASE ( ACCEPTOR ) ) { MQO ) . 

gtgagtcaacccggcgaagaaagttcaaatgtttggaataatgcgggaacaggtcattca 
gcattgtgtgaattgaactatacgaaagaaggtaaagatggttcagtagatattactaaa 
gcaattcatattaacgagcaatttcaaatatctaaacagttttgggcttatttaatacgt 
gaaggtcatattgaaagtccagataaatttattcaatcagtgccacatatgagctttgtt 

45 aaaggggaagaaaatgttaaatttttaaaaagtcgagtggcgagtttacagaaaaatgta 
ttatttgaaaaaatgaaaatttctcaagatccagaaaaaattaactcatgggttccttta 
atgatggaaggacgccaatcagatgaagcaattgccattacgtatgacgagacaggtaca 
gatgttaactttggtgctttgactaaaaagttaatagctaatttacaacaaaaaaatgtt 
ggcattaattataaacatgaagttttagatataaaaaaattaaataatggtaactggcaa 

50 gttgtggttaaagatttaaatacatcaaatgtaatgaattatgaatctaagttcgtcttc 
atcggagctggtggtgcaagtttacctttattacaaaaaacaaagattaaggaatctaaa 
cacattggtggtttcccagtaagtggattatttttacgatgtaaaaatccagatgtcata 
catagacatcatgcaaaagtctacggtaaagccgaggttggtgcacctccaatgtcagtt 
ccacatttagatacacgatttgttaatggtgaaaaatcattactatttggaccttttgca 

55 gggttttcgccaaaattcttaaaaaacggttcatatttagatttagttaaatctgtgaaa 
cccaataatatgataacaatgttaagtgctggcgtaaaagaatttaatttgacgaaatat 
ttagtttctcaattaatgctttcaaatgaagaacggatcaatgatttgcgtgtattctta 
ccagaagcgaaagatgaagattgggaagtaattactgcaggtcaacgtgttcaagtaatt 
aaagatacagataagtctaaaggtcaattacaatttggtacggaagtaataacatcagaa 
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gatggttcacttgctgcattattaggtgcttcacctggtgcttcgactgctgttgatatc 
atgtttgatgtcttgcaacgttgttacaaatcagagtttaagtcatgggaaccaaaaatt 
aaagaaatggtcccatcatttggtttaaaattgtcagagcatgaagatatgtaccattca 
ataaacgaagaagtaaaaaaatacttgaatgtaaagtag 

5 

Sequence 3092 

VSQPGEESSNVWNNAGTGHSALCELNYTKEGKDGSVDITKAIHINEQFQISKQFWAYLIR 
EGHIESPDKFIQSVPHMSFVKGEENVKFLKSRVASLQKNVLFEKMKISQDPEKINSWVPL 
MMEGRQSDEAIAITYDETGTDVNFGALTKKLIANLQQKWGINYKHEVLDIKKLNNGNWQ 
10 VWKDLNTSNVMNYESKFVFIGAGGASLPLLQKTKIKESKHIGGFPVSGLFLRCKNPDVI 
HRHHAKVYGKAEVGAPPMSVPHLDTRFVNGEKSLLFGPFAGFSPKFLKNGSYLDLVKSVK 
PNNMITMLSAGVKEFNLTKYLVSQLMLSNEERINDLRVFLPEAKDEDWEVITAGQRVQVI 
KDTDKSKGQLQFGTEVITSEDGSLAALLGASPGASTAVDIMFDVLQRCYKSEFKSWEPKI 
KEMVPSFGLKLSEHEDMYHSINEEVKKYLNVK* 

15 

Sequence 3093 
Con t i g_0 6 2 0_po s_2 8 0 2_3 113 
is similar to (with p-value 7,0e-21) 
>pir:pir |A00461|DEECR NADH dehydrogenase (EC 1.6.99.3) - Es 

20 cherichia coli >gp : gp | D90746 ] D90746_3 Escherichia coli genom 
ic DNA.(24.9 - 25.3 min) . NID: gl651543. >gp :gp | V00306 | ECNDH 
X_l E. coli gene ndh coding for respiratory NADH dehydrogena 
se (a component of the electron transport chain) . This enzym 
e catalyses the transfer of electrons from NADH to the respi 

25 ratory chain and thus links the major catabolic and energy-p 
reducing pathways of the cell. NID: g42112. >gp : gp | AE000211 | 
AE000211_7 Escherichia coli K-12 MG1655 section 101 of 400 o 
f the complete genome. NID: gl7873 45. 

atgttaaaaaataacgacatgtctgatggttatttaaaaattaaagtaaatggtggagga 
30 tgcacaggattaacttatggtatgtcagccgaagcagaacctggtgaaaatgatgaaatt 
ctcgaatactatggtttgaaagttctagtagaccgaaatgatgctcctgtattaaatgga 
acaacaattgattttaaacagtcacttatgggtggaggatttcaaataaacaatcctaat 
gctattgcctcatgtggttgtggaagttcatttaaaacagctaaagtcgctggaaatcca 
gagcaatgttaa 

35 

Sequence 3094 

MLKNNDMSDGYLKIKVNGGGCTGLTYGMSAEAEPGENDEILEYYGLKVLVDRNDAPVLNG 
TTIDFKQSLMGGGFQINNPNAIASCGCGSSFKTAKVAGNPEQC* 

40 Sequence 3095 

Contig_0620_pos_3383_4507 

is similar to (with p-value 9.0e-18) 
>sp:sp|Q44540 |YNIU_AZOVI HYPOTHETICAL 11.0 KD PROTEIN IN NI 

FU 5'REGION (0RF6). >gp : gp | M20568 | AVINIFC_16 A.vinelandii ma 
45 jor nif gene cluster encoding nitrogen fixation complex, com 

plete cds. NID: g758356. . 

atggctcaagatcgtaagaaagtgctcgtattaggtgcaggttatgctggtttacaaact 
gtaactaaattacaaaaagaactttctgctgatgcagcggaaattactttaattaataag 
aatgaata teat tatgaatcaacttggttgcatgaagcttctgccggtacgattaat tat 

50 gaagatttattgtatcctgttgagaaaactgtcaacaaaaataaagtgaattttgttgtt 
gctgaggtaacaaaaattgatcgtaatgctaaacgtgtagaaactgataagggtgtttat 
gactttgatatcttagttgttgcactaggttttgttagcgaaacatttggtattgatggt 
atgaaagaacatgctttccaaattgagaacgttttaacttctcgtaagttgtctcgtcac 
attgaagataagttcgcaaattatgctgcttctaaagaaaaagatgataaagatttatct 

55 attttagttgggggagctggatttacaggaattgaatttctaggtgaattaactgataga 
attcctgaattatgcagtaaatatggtgttgatcaaagtaaagtgaagttaacatgtgtt 
gaagcagcacctaaaatgttaccgatgttctcagacgacttagttagttatgcagtaaaa 
tatttagaagaccgtggagtagaattcaaaattgcaacacctattgtcgcttgtaatgaa 
aaaggtttcgttgttgaagtcaatggagaaaaacaacaattagaagccggaacttctgta 
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tggactgctggagtgcgtggaagtcatttaatggaagaatcatttgaaggtgttaaacgt 
ggacgtattatcaataaacaagatttaacaattgaaggtcataatgacatctttgttata 
ggagattgttcagcgtttattccagctggtgaagagcgtccattaccaacaacagctcaa 
attgctatgcaacaaggtgagcatactgctagcaacattaaacgtttattaaatggtgaa 
5 tcaacacaagatt tccaatatgttaacctaacgatcaagatgcagttgaaaggaaagaga 
aagaaactgagtttcaaaaacaacaagatgaagaaattgctttaa 

Sequence 3096 

MAQDRKKVLVLGAGYAGLQTVTKLQKELSADAAEITLINKNEYH YESTWL HE AS AGT I NY 
10 EDLLYPVEKTVNKNKVNFWAEVTKIDRNAKRVETDKGVYDFDILWALGFVSETFGIDG 
MKEHAFQIENVLTSRKLSRHIEDKFANYAASKEKDDKDLSILVGGAGFTGIEFLGELTDR 
IPELCSKYGVDQSKVKLTCVEAAPKMLPMFSDDLVSYAVKYLEDRGVEFKIATPIVACNE 
KGFWEVNGEKQQLEAGTSVWTAGVRGSHLMEESFEGVKRGRIINKQDLTIEGHNDIFVI 
GDCSAFIPAGEERPLPTTAQIAMQQGEHTASNIKRIiLNGESTQDFQYVNLTIKMQLKGKR 
15 KKLSFKNNKMKKLL* 

Sequence 3097 
Contig_0620_pos_2198_1134 
is similar to (with p-value 7.0e-29) 
20 >gp:gp| Y09899 |CVPME131_3 C.viguieri phS gene, gene encoding 
putative NADH dehydrogenase and two genes encoding unknown 
proteins. NID: g2765033 . 

atgaaaaacttagtattactaggcgggggctatggtaatatgcgaattatgtcgcgcatt 
ttacctcattcaattcctgagggatatcacttaactttaatcgaccgcatgccattccac 

25 ggtttaaaacctgaattttatgcacttgcagcaggaactaaatctgacaaagaggtgcga 
atccaatttccagatagcagtcaaattaatacggtttatggggaaatcagtgatatagat 
ttggacgaacaaatgataacagttggaaattcaaaaatagattatgacgaacttatcatt 
ggtctagggtgtgaggataaatatcataatgtccctggtgctgaagcatatacacatagc 
attcaaacattatctaaatcgcgtgaaacataccatagaattagcgagttacctaaaggg 

30 gcacgtgtaggtatcgttggggcaggtttaagtggcattgaattagcgagcgagctacgt 
gaaagtcgatcagacttggaaattttgttatatgatagagggcctcgaattttaaggaat 
tttccagagaaactgagtaaatacatatctaattggttttctaaacacaatgttactgta 
gtacctaattcagtcatcgacagagtagaacccggaaaaatttataataatggtaaacca 
gaaaatattgatttagtcgtttggacagcaggcatacaacctgttgaaattgtgcgtaat 

35 cttcctattgatatgagtaccactggacgcgtaattattaatcagtaccatcaagtccca 
acctatagaaatgtttatgtcgtaggtgactgtgctaatttaccacatgcacccagtgct 
caac tagcagaactacaaggtgaacagattgctgaggtgttgaagaagcaatggaataac 
gaaccacttccagataaaatgcctgaaattaaagtacaaggctttttaggctctttaggt 
gacaaacaaggttttgcttatatcatggatcgaacagttaccggacgattagcctctatt 

40 ctaaaatcaggtgttctgtggcgctataaatatcataatggttaa 

Sequence 3098 

mknlvllgggygnmrimsrilphsipegyhltlidrmpfhglkpefyalaagtksdkevr 
iqfpdssqintvygeisdidldeqmitvgnskidydeliiglgcedkyhnvpgaeayths 
45 iqtlsksretyhriselpkgarvgivgaglsgielaselresrsdleillydrgprilrn 
fpeklskyisisnfljfskhnvtwpnsvidrvepgkiylwgkpenidlvvwtagiqpveivrn 
lpidmsttgrviinqyhqvptyrnvywgix:anlphapsaqlaelqgeqiaevlkkqwnn 
eplpdkmpeikvqgflgslgdkqgfayimdrtvtgrlasilksgvlwrykyhng* 

50 Sequence 3099 

Contig_0622_pos_4948_4412 

is similar to (with p-value l,0e-61) 

>9P:gp|M57622 (BACRGB_1 B . stearothermophilus ribosomal prote 
in L6 gene, complete cds. NID: gl43418. 

55 atgagtcgtgttggtaagaaaattattgacattcctagtgacgtaacagtaacttttgac 
ggaagtcatgtcactgtaaaaggtccaaaaggtgaattagaaagaactttaaatgaaaga 
atgacatttaaacaagaagaaaacactgttgaagttgtaagaccatctgattctaaagaa 
gacagaacagatcatggtacaactcgtgctttattaaataatatggtactaggtgtttct 
caaggttacgaaaaaacacttgagcttgttggtgtaggttaccgtgcacaaatgcaaggt 
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aaagatttagtacttaatgttggatactctcacccagttgaaattaaagcagaagaaggc 
attactttcgctgttgagaaaaatacaactgttaaagtatctggtgtttctaaagaacaa 
gttggtgcgattgcttctaacattcgttctgtaagacctccagaaccttataaaggtaaa 
ggtattcgctaccaaggtgaatatgtacgccgtaaagaaggtaaaactggtaaataa 

5 

Sequence 3100 

MSRVGKKIIDIPSDVTVTFDGSHVTVKGPKGELERTLNERMTFKQEENTVEWRPSDSKE 
DRTDHGTTRALLNNMVLGVSQGYEKTLELVGVGYRAQMQGKDLVLNVGYSHPVEIKAEEG 
ITFAVEKNTTVKVSGVSKEQVGAIASNIRSVRPPEPYKGKGIRYQGEYVRRKEGKTGK* 

iO 

Sequence 3101 

Contig_0623_pos__10299_93 5a 
is similar to (with p-value 7.0e-34) 
>gp:gp| AF017231 I AF017231_1 Trypanosoma brucei brucei inosin 
15 e-adenosine-guanosine-nucleoside hydrolase mRNA, complete cd 
s. NID: g2645494. 

atgaccaaagtatattttaatcatgatggtggcgttgatgatctagtgtcactattttta 
ttattacaaatggaaaatatagaacttgttggcgtaagtacgattggtgcagactgctat 

ttagagccttcattaagtgcttcattaaagattataaatcgtttttcagacgttgaaatt 
20 aatgtagcaccatcatatgaaagagggaaaaatccttttccaaaagaatggagaatgcat 
gctttctttatggatgccttaccagtgctaaatgagtcttgtatacccaaaagatgcaag 
gctagtgaggacgaggcgtatatagatattattcgtaaagtgaagagttgtgatgagaaa 
gttacattgttatttactggaccgcttacagatttagctaaagctataaaatatgacaac 
tcaattttaaaaaatatagagaaattagtctggatgggtggaacgtttttagacaaagga 
25 aatgttgaagaaccagaacatgatggtacagctgaatggaatgcattttgggatcctgag 
gctgtaaaagttgtattagatagtgatatgaatgtcgatattgttgctttagaaagtaca 
aatcaagtccctctaacaatggaagttcgtcaaatgtgggcagataaaagacaatattta 
ggcgttgattttctgggcacaagttacgcagcagtaccaccactcacacattttgtgacc 
aattcaacatactttttatgggatgtattaactactgcttatgtgggttctccaaattta 
30 gttgaatcaacgaaattgaaaattgatgtagtcagtcaaggacctagtcaagggagaaca 
t tccaatc tgaatatggacgtgaagt tcaag tea t tacgga tg taaa taaacaagca 1 1 1 
tttaactacataacggatttagcaaagaaaatcgagtcctaa 

Sequence 3102 

35 MTKVYFNHDGGVDDLVSLFLLLQMENIELVGVSTIGADCYLEPSLSASLKIINRFSDVEI 
NVAPSYERGKNPFPKE\ftmMHAFFMDALPVLNESCIPKRCKASEDEAYIDIIRKVKSCDEK 
VTLLFTGPLTDLAKAIKYDNSILKNIEKLVWMGGTFLDKGNVEEPEHDGTAEWNAFWDPE 
AVKWLDSDMNVDIVALESTNQVPLTMEVRQMWADKRQYLGVDFLGTSYAAVPPLTHFVT 
NSTYFLWDVLTTAYVGSPNLVESTKLKIDWSQGPSQGRTFQSEYGREVQVITDVNKQAF 

40 FNYITDLAKKIES* 

Sequence 3103 

Contig_0625_pos_635_1573 

is similar to (with p-value 4.0e-70) 

45 >sp:sp|P54524|YQIG_BACSU PROBABLE NADH- DEPENDENT FLAVIN OXI 
DOREDUCTASE YQIG (EC 1.-.-.-). >gp : gp | D84432 | BACJH642_230 Ba 
cillus subtilis DNA, 283 Kb region containing skin element. 
NID: g2627063. >gp :gp | Z99116 | BSUB0013_132 Bacillus subtilis 
complete genome (section 13 of 21): from 2395261 to 2613730. 

50 NID: g2634723. 

gtggaaagggggagtcaaaaggtgaaaacgctaatgattaaagcaatggggacggtgata 
cgtttatcgattgagcatcaacatccggatacattacttcaagaagctgaaataaaaatt 
cgtgcttgggaatcacaatttagtgctaatgatccgaaatcagatttgatgaatgtgaat 
cagcatgcaggtatcgcaccagtcaaggttagttctgagatgtttaacatgatacgtttt 

55 ggttacgaaactacattatcttctaattttaagatgaacattttgatagggccactagtc 
aaattatggaaaattggttttaaagatgcattgaaacctaaagaagaggatatacaacgt 
get ttattgtgtatggatcctgaaaatcttgttc taaa ttcaaaaacacatgaagtattt 
cttacacaatcaggaatggagattgatttaggagctatagttaaaggctattttgctgat 
caattacagcaatactttttagctcatggtgtatcttctggcattatcgatttaggtggt 
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aatgttttaacaattggtagacaacccgaaacattagaaaaatggcatgtaggtgtacgt 
aatccatttcataaggatacactaccactcgttacattaagcgtagagcatcaatcagtt 
gtcacatcaggtatctacgaacgctacttcatacaggaaaatcaattatttcatcatata 
ttggattcaacaacaggttatcctgtagataatgatatcgctagcgtgacaatcatatct 
5 gatcatgggattgatggcgaggtatggagtacaatttgtagttttggtcagtcacaaaaa 
aatattgaattattaaatctcattgacggtattgaaggcattattgtgacaagagatgga 
agcg 1 1 1 taa tgac t tcgaaaa tgcaaaagta 1 1 tataa 

Sequence 3104 

10 VERGSQKVKTLMIKAMGTVIRLSIEHQHPDTLLQEAEIKIRAWESQFSANDPKSDLMNVN 
QHAGIAPVKVSSEMFNMIRFGYETTLSSNFKMNILIGPLVKLWKIGFKDALKPKEEDIQR 

ALLCMDPENLVLNSKTHEVFLTQSGMEIDLGAIVKGYFADQLQQYFLAHGVSSGIIDLGG 
NVLTIGRQPETLEKWHVGVRNPFHKDTLPLVTLSVEHQSWTSGIYERYFIQENQLFHHI 
LDSTTGYPVDNDIASVTIISDHGIDGEVWSTICSFGQSQKNIELLNLIDGIEGIIVTRDG 
15 SVLMTSKMQKYL* 

Sequence 3105 
Con t ig_0 6 2 5_pos_l 5 9 1_4 674 
is similar to (with p-value 8.0e-25) 
20 >sp:sp|P33944|YOJL_ECOLI HYPOTHETICAL 38.5 KD LIPOPROTEIN I 
N ADA-OMPC INTERGENIC REGION PRECURSOR. >gp : gp | AE0003 10 | AEOO 
0310_6 Escherichia coli K-12 MG1655 section 200 of 400 of th 
e complete genome. NID: g2 3 67 131. 

atgaagagtattttatttttaagtaataatgtgaaaatattcacaaaaaaactaggagga 

25 tttgctatgagtaaggaaatattcgatacttttaaatttaaatgtggtgccgaattaaaa 
aatagagtattaatggcacccatgactatccaagctgggtattttgatggaagtgttaca 
tcagaaatgattgattattatcaatttagagctggtgatgcttcagcaatcattgttgaa 
agttgttttgttgaaaatcacggacgaggatttccgggagctataggtattgataatgat 
gacaaaatacctggactcaaacgtttagcagaagcgattcaagctaagggatcaaaagcg 

30 attttgcaactttatcatgccggaagaatggcaaatcctaaatttaatgaaggagagcag 
ccgatatctgcgagccccattgcagcattaagacctgatgctgtaccacctagagaaatg 
acacatgctcaaatcaatcagatgattgatgactttggagaggctacacgtcgcgctata 
gaagcggggtttgatggtgtcgaaattcatggcgccaacacatacttattacaacaattt 
ttctctccacattctaatcggagacaagattcatggggaggcagtcgtgaaaaacgtaca 

35 cgatttccaatcgaagttttgacaaaggttcaacacgtcgttgctgaaaaagaggcttct 
ca ttttattataggatatcgattctcacctgaagaaattgaagaaccaggcatacgtttt 
gaagataccatgtttttactaaatacattagcagaatatgaacctgattacttccatata 
tcagcaaacagttatcaacgtacatctattgtgaatcaagaagatacagaacctttaatt 
aataagtacatcaaaatgcaaagtgcacagttggcaaaaattccattaattggtgtaggt 

40 agtattgcccaacgacaagatgcagaacatgcccttgaactaggatatgatcttttaagt 
gttgggaaagcctatttagtggaaccacaatggacagataaaatttcacaaaacgaagaa 
gtagaacaatttgtcgatatacatgatcagaaagtacttcacataccatcccctttatgg 
aaagtaatggactttatgattttagataaagaagaagagcatcgtaaatatgaaaaatta 
aaagcacttcaaaataaaaaagttaaatttaacaaaggtacgtatcatgtctatgcaaaa 

45 ggtcataatggcaacttacctatgaaagtccaattatcagaagataagattgtaagtatc 
gaggtagatgatagcggagagtctgaaggcatagcgaacccagtgtttgaacgtttacct 
caagatattatcaatgggcaaacactgaatgtagatgtcatttcaggtgcgacagtaaca 
agtgaaggcatcgtgcaaggtattgcagatgcaattgaacaagcaggagaagacccagat 
attttacgggcgcgtcctaaaccagtcgttcagtggtctgatgaggttgttgaagagacg 

50 actgatgtcgttgtgattggtacaggaggtgccggactcagtgcagctgctacggcatta 
gatgaaggaaaagaagtcatcatgcttgagaaatttgcagctataggtggcaatactatc 
cgtacaggtggtcaagtcaacgctgctgagcctaaatggcaaaatgcattcccggcactt 
gctggtgaaaaagagacactcatacagttattaaatcatgatgaaaatgatatagatgaa 
gcttacattgaagatttcaatactttaaaacgtcaaattaaagactatcttgaaaatagc 

55 agtaatgaagatgaatatctttttgattctgtcgaattacatcgtattcaaacatattta 
ggtggtaaacgtaaagatcgtaataatgtcgaaatttcaggtgattatgatttagttaaa 
acactcacagataacgttttggaatcagtatattggttgaaagacaaaggtgtacatttt 
gatcgttcgtttgtagatatgcctgtgggtgctttatggcgtcgtggtcataaaccaatg 
aaagcacaaggtttagagtacattgaaaatttaggagactacgttaaacataatcatggt 
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cgtatttttacagaaactactgcagaaaagttaattaaagaaggtaatcaagttgttggt 
attgaagcacgtaaagcaaatggtgctaaagtgaagattcatacacgtcatggtgtagtg 
ttggctactggtggctttggagcgaatacaaaaatgctacaacaatataatacgtattgg 
gataatattcctgatgacattaagacaacgaattctcctgcaattacaggtgatggtata 
5 cgtttaggtgtgcaggcaggcgctgacatcgtaggtatgggattctctcaaatgatgccg 
atttctgatccgaagacaggtgcattgtttactggattaattgtaacaccttcaaacttt 
gtcttcgttaataaggaaggacagcgttttgttaacgaatttgaaagtagagatgtatta 
tctaaggcagcattagaacaaaaagacggtatcttctatattattgcagatgcaaatatt 
aaagcactagctatgaatacaactgaggataaaattaatcaagaattagaagacggcact 

10 ttagtaaaagcagataccttagaagcattagcccaaaaattaaacattgatacaactact 
tttgtgaacacgattgaaagatataataccttcgtagaacaaggacaagatgaggatttc 
aataaaaatgcatttgatttaaaaattgaaaaagcaccattctacgcgacaccacgtaaa 
cctgcaatacatcatactatgggtggtttaaaaataaacacgcatgcacaagttatagat 
gttgaaggtcatatcattgaaggtttatatgcggctggtgaagttgccggtgg tat teat 

15 gctggtaaccgtttaggcggaaatgcactggcagatatttttacttttggtcgcattgcc 
ggtcaaagtgctgtaacgaaataa 

Sequence 3106 

MKSILFLSNNVKIFTKKLGGFAMSKEIFDTFKFKCGAELKNRVLMAPMTIQAGYFDGSVT 

20 SEMIDYYQFE^GDASAIIVESCFVENHGRGFPGAIGIDNDDKIPGLKRLAEAIQAKGSKA 
ILQLYHAGRMANPKFNEGEQPISASPIAALRPDAVPPREMTHAQINQMIDDFGEATRRAI 
EAGFDGVEIHGANTYLLQQFFSPHSNRRQDSWGGSREKRTRFPIEVLTKVQHWAEKEAS 
HFIIGYRFSPEEIEEPGIRFEDTMFLLNTLAEYEPDYFHISANSYQRTSIVNQEDTEPLI 
NKYIKMQSAQLAKIPLIGVGSIAQRQDAEHALELGYDLLSVGKAYLVEPQWTDKISQNEE 

25 VEQFVDIHDQKVLHIPSPLWKVMDFMILDKEEEHRKYEKLKALQNKKVKFNKGTYHVYAK 
GHNGNLPMKVQLSEDKIVSIEVDDSGESEGIANPVFERLPQDIINGQTLNVDVISGATVT 
SEGIVQGIADAIEQAGEDPDILRARPKPWQWSDEWEETTDVWIGTGGAGLSAAATAL 
DEGKEVIMLEKFAAIGGNTIRTGGQVNAAEPKWQNAFPALAGEKETLIQLLNHDENDIDE 
AYIEDFNTLKRQIKDYLENSSNEDEYLFDSVELHRIQTYLGGKRKDRNNVEISGDYDLVK 

30 TLTDIWLES\nrWLKDKGVHFDRSFVDMPVGALWRRGHKPMKAQGLEYIENLGDYVKHNHG 
RIFTETTAEKLIKEGNQWGIEARKANGAKVKIHTRHGVVLATGGFGANTKMLQQYNTYW 
DNI PDDI KTTNS PAITGDG I RLGVQAGADI VGMGFSQMMP I SDPKTGALFTGL I VTPSNF 
VFVNKEGQRFVNEFESRDVLSKAALEQKTCIFYIIADANIKALAMNTTEDKINQELEDGT 
LVKADTLEALAQKLNIDTTTFVNTIERYNTFVEQGQDEDFNKNAFDLKIEKAPFYATPRK 

35 PAIHHTMGGLKINTHAQVIDVEGHIIEGLYAAGEVAGGIHAGNRLGGNALADIFTFGRIA 
GQS AVTK* 

Sequence 3107 
Contig„0625^os„7554_4822 
40 is similar to (with p-value 3.0e-18) 

>cfP-gp|AF061185 |aF061185_1 Phytophthora infestans cyst germ 
ination specific acidic repeat protein precursor (car90) gen 
e, complete cds . NID: g3851513 . 

gtggatgaaatcgttcattatggtggcgaagaaatcaagccaggccataaggatgaattt 
45 gatccaaatgcaccgaaaggtagtcaaacaacgcaaccaggtaagccgggggttaaaaat 
cctgatacaggcgaagtagttactccacctgtggatgatgtgacaaaatatggtccagtt 
gatggagatccgatcacgtcaacggaagaaattccattcgacaagaaacgtgaattcaat 
cc t ga 1 1 taaaaccagg tgaagagcg tgt t aaacaaaaaggtgaaccaggaacaaaaaca 
attacaacaccaacaactaagaacccattaacaggggaaaaagttggcgaaggtgaacca 
50 acagaaaaaataacaaaacaaccagtagatgaaatcacagaatatggtggcgaagaaatc 
aagccaggccataaggatgaatttgatccaaatgcaccgaaaggtagccaagaggacgtt 
ccaggtaaaccaggagttaaaaaccctgatacaggcgaagtagtcacaccaccagtggat 
gatgtgacaaaatatggtccagttgatggagatccgatcacgtcaacggaagaaattcca 
ttcgacaagaaacgtgaattcaatcctgatttaaaaccaggtaaagagcgcgttaaacag 
55 aaaggtgaaccaggaacaaaaacaattacaacaccaacaactaagaacccattaacaggg 
gaaaaagttggcgaaggtgaaccaacagaaaaagtaacaaaacaaccagtagatgaaatc 
acagaatatggtggcgaagaaatcaagccaggccataaggatgaatttgatccaaatgca 
ccgaaaggtagccaagaggacgttccaggtaaaccaggagttaaaaatcctgatacaggc 
gaagtagttactccaccagtggatgatgtgacaaaatatggtccagttgatggagatccg 
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attacgtcaacggaagaaattccgtttgataaaaaacgcgaatttgatccaaacttagcg 
ccaggtacagagaaagtcgttcaaaaaggtgaaccaggaacaaaaacaattacaacacca 
acaactaagaacccattaacaggggaaaaagttggcgaaggtgaaccaacagaaaaagta 
acaaaacaaccagtggatgaaatcgttcattatggtggcgaagaaatcaagccaggccat 
5 aaggatgaatttgatccaaatgcaccgaaaggtagccaagaggacgttccaggtaaacca 
ggagttaaaaaccctgatacaggcgaagtagtcacaccaccagtggatgatgtgacaaaa 
tatggtccagttgatggagatccgatcacgtcaacggaagaaattccgtttgataaaaaa 
cgcgaatttgatccaaacttagcgccaggtacagagaaagtcgttcaaaaaggtgaacca 
ggaacaaaaacaattacaacaccaacaactaagaacccattaacaggggaaaaagttggc 

10 gaaggtgaaccaacagaaaaagtaacaaaacaaccagtagatgaaatcgttcattatggt 
ggcgaagaaatcaagccaggccataaggatgaatttgatccaaatgcaccgaaaggtagc 
caagaggacgttccaggtaaaccaggagttaaaaatcctgatacaggcgaagtagtcaca 
ccaccagtggatgatgtgacaaaatatggtccagttgatggagatccgattacgtcaacg 
gaagaaattccattcgacaagaaacgtgaattcaatcctgatttaaaaccaggtgaagag 

15 cgtgttaaacagaaaggtgaaccaggaacaaaaacaattacaacaccaacaactaagaac 
ccattaacaggggaaaaagttggcgaaggtgaaccaacagaaaaaataacaaaacaacca 
gtagatgaaatcacagaatatggtggcgaagaaatcaagccaggccataaggatgaattt 
gatccgaacgcaccgaaaggtagccaagaggacgttccaggtaaaccaggagttaaaaat 
cctgatacaggcgaagtagtcacaccaccagtggatgatgtgacaaaatatggtccagtt 

20 gatggagatccgattacgtcaacggaagaaattccgtttgataaaaaacgcgaatttgat 
ccaaacttagcgccaggtacagagaaagtcgttcaaaaaggtgaaccaggaacaaaaaca 
attacaacaccaacaactaagaacccattaacaggagaaaaagttggcgaaggtgaacca 
acagaaaaaataacaaaacaaccagtggatgagatcgttcattatggtggcgaagaaatc 
aagacaggccataaggatgaatttgatccgaacgcaccgaaaggtagtcaaacaacgcaa 

25 ccaggtaagccaggagttaaaaatcctgatacaggcgaagtagtcacaccaccagtggat 
gatgtgacaaaatatggtccagttgatggagatccgattacgtcaacggaagaaattccg 
tttgataaaaaacgcgaatttgatccaaacttagcgccaggtacagagaaagtcgttcaa 
aaaggtgaaccaggaacaaaaacaattacaacgccaacaactaagaacccattaacaggg 
gaaaaagttggtgaaggtgaaccaactctaaagacacctgttaagagtgatgtcagactg 

30 accgcaatattaacaatactcatatctacatga 

Sequence 3108 

VDEIVHYGGEEIKPGHKDEFDPNAPKGSQTTQPGKPGVKNPDTGEWTPPVDDVTKYGPV 
DGDPITSTEEIPFDKKREFNPDLKPGEERVKQKGEPGTKTITTPTTKNPLTGEKVGEGEP 

35 TEKITKQPVDEITEYGGEEIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPDTGEWTPPVD 
DVTKYGPVDGDPITSTEEIPFDKKREFNPDLKPGKERVKQKGEPGTKTITTPTTKNPLTG 
EKVGEGEPTEKVTKQPVDErTEYGGEEIKPGHKDEFDPNAPKGSQEDVPGKPGVKNPDTG 
EWTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFDPNLAPGTEKWQKGEPGTKTITTP 
TTKNPLTGEKVGEGEPTEKVTKQPVDEIVHYGGEEIKPGHKDEFDPNAPKGSQEDVPGKP 

40 GVKNPDTGEWTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFDPNLAPGTEKWQKGEP 
GTKTITTPTTKNPLTGEKVGEGEPTEKVTKQPVDEIVHY6GEEIKPGHKDEFDPNAPKGS 
QEDVPGKPGVKNPDTGEWTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFNPDLKPGEE 
RVKQKGEPGTKTITTPTTKNPLTGEKVGEGEPTEKITKQPVDEITEYGGEEIKPGHKDEF 
DPNAPKGSQEDVPGKPGVKNPDTGEVVTPPVDDVTKYGPVDGDPITSTEEIPFDKKREFD 

45 PNLAPGTEKWQKGEPGTKTITTPTTKNPLTGEKVGEGEPTEKITKQPVDEIVHYGGEEI 
KTGHKDEFDPNAPKGSQTTQPGKPGVKNPDTGEWTPPVDDVTKYGPVDGDPITSTEEIP 
FDKKREFDPNLAPGTEKWQKGEPGTKTITTPTTKNPLTGEKVGEGEPTLKTPVKSDVRL 
TAILTILIST* 

50 Sequence 3109 

Contig_0627_pos_5488_4337 

is similar to (with p-value 2 ,00-86) 
>sp:sp( P77364 I YBBZ_ECOLI HYPOTHETICAL 38.7 KD PROTEIN IN GI 

P-FDRA INTERGENIC REGION. >gp:gp | U82664 | ECU82664_110 Escheri 
55 chia coli minutes 9 to 11 genomic sequence. NID: gl773084. > 

gp:gp|U89279|ECU89279_5 Escherichia coli glyoxylate induced 

proteins GlxBl, GlxB2 , GlxB3 , GlxB4, GlxB6 , GlxB7 and GlxB8, 
and glycerate kinase GlxB5 genes, complete cds . NID: g27352 

35. >gp:gp|AE000157|AE000157_8 Escherichia coli K-12 MG1655 
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section 47 of 400 of the contplete genome. NID: gl786716. 
atgtcaccatataaagttatcattgcacctgattcttttaaagaaagtatgtcggcgaag 

gaagctgctttagctattaaagatggattccaagaggtgttcgattccagtacaatatat 
gacattattcctatggctgatggtggcgagggaacaaccgaagtattgaaagaagcctta 
5 aatgctacctcttattgtgtagaagtaaaagatccacttaatagaaatatcatggctagt 
tatgcgagaagcgacgaacaccaaacagctatcattgaaatggcagctgcttcaggacta 
gcattattgagtaaagatgaaagagatccatctattacaacttcgtacggtaccggccaa 
ctcattaatgatgcacttaatcacgatgttaataaaattattttaggaataggtggaagt 
gccacgaatgatggtggtgtaggaatgttaaaggctttaggtgtctcttttaaagataaa 

10 aacaatcaagagattcgcgatggaggtttagccctatctcaaatagaatacattgatatt 
actcgtataaacccacgattgaaagatgtgaatattaaagtagcctgtgatgtaactaat 
ccattattaggagacaatggagcaacaatagtttatggtccacaaaaaggcgctcagcaa 
aagatgataccaaagttggattcagcattacgtcactatcatgataaaattgaaagagaa 
ttaaatatgaatgtaaaagatatcccgggcgctggtgctgcaggaggcatgggaactgca 

15 ttaatcgcgtttctaaacgctaaattacgtcctggaattgatgtagttcttgaagagact 
caatttaaacaaaggataaaagatgcgaatttagttgttactggcgaaggtaaaatggat 
aaacaaacaatctatggcaaaacacccattggcgtagccaaagttgcaaaatcatatgat 
atacccgtcattgctatttgtggtagtttaggaaaagattacgaagcaatttatcaccac 
ggtatcgatagcgtgtttagtatcatggaacgtccatgccaccttgacgaagctttgaaa 

20 gaaggcgcacttcatgttaaacatacaacaataaatatcgcacgacttttacaagtaaaa 
attgaaaaatga 

Sequence 3110 

MSPYKVIIAPDSFKESMSAKEAALAIKDGFQEVFDSSTIYDIIPMADGGEGTTEVLKEAL 
25 NATSYCVEVKDPLNRNIMASYARSDEHQTAIIEMAAASGLALLSKDERDPSITTSYGTGQ 

LINDALNHDVNKIILGIGGSATNDGGVGMLKALGVSFKDKNNQEIRDGGLALSQIEYIDI 
TRINPRLKDVNIKVACDVTNPLLGDNGATIVYGPQKGAQQKMIPKLDSALRHYHDKIERE 
LNMNVKDIPGAGAAGGMGTALIAFLNAKLRPGIDWLEETQFKQRIKDANLWTGEGKMD 
KQTIYGKTPIGVAKVAKSYDIPVIAICGSLGKDYEAIYHHGIDSVFSIMERPCHLDEALK 
30 EGALHVKHTTIN I ARLLQVKI EK* 

Sequence 3111 
Cent ig_0 6 3 0_pos_4 1 1 1_3 176 
is similar to (with p-value 3.0e-25) 
35 >sp:sp|P19452 |HUTG_KLEAE FORMIMINOGLUTAMASE (EC 3.5.3.8) (F 
ORMIMINOGLUTAMATE HYDROLASE) (HISTIDINE UTILIZATION PROTEIN 
G) (FRAGMENT). >gp : gp | M34 604 | KPNHUTC_1 K.aerogenes histidine 
utilization repressor C (hutC) gene, complete cds . NID: gl4 
9203. 

40 atgtatcaacttgcacaatctaatctatggacaggtcgtttagatagtgaaactgatcct 
acacaatttagacacttccaaactgttaaattcggtgatttaagtcaattagatttttcg 
gatgaacacaaaggcgtgggcttattaggatatgcaattgataaaggagtagaattaaac 
aaaggacgtgtaggtgcaaaagaaggtcccaatgccattaagcgagcttttgctggattg 
ccagatttgaatcaatgtgaagagattatagattatggtaatgtagaacacaatcatgag 

45 ttgctaatagatacacagcgcgaattcgcagatcttgctgctaagtctatcaaacgacat 
aaacaaacatttttacttggtggcggtcatgatatagcatatgcacaatatttagctact 
cgtaaagtttatcctgagtcgtcaataggtgtgattaatatagatgcgcactttgacaca 
cgcgatgagggttattcaacctctggtactagttttagacagattctagaagaagatgat 
aatgcagattatttagtgttaggtatatctcaaggtggtaatacacaagctttatttaat 

50 tatgctaaagaaaaagatattcaatttgtatatgcagatgaattactacatcaggtatct 
ccccccattaaagatatgatagaacgttttatccataatcatgatacggttatgttcaca 
atttgcatggatgtagtagatagtgcatttgcaccaggagtcagtgcaccagctgtccta 
ggaatatatccacatacagttttcgaacttgctaaacgggtcattccaagtgaaaaagta 
aaatctataagtatagctgaaatgaatccgacgtatgattcagatcaaagaactgctaaa 

55 ttagttgctaatttagtacatcattgtttaatttaa 

Sequence 3112 

MYQLAQSNLWTGRLDSETDPTQFRHFQTVKFGDLSQLDFSDEHKGVGLLGYAIDKGVELN 
KGRVGAKEGPNAI KRAPAGLPDLNQCEEI IDYGNVEHNHELLIDTQREFADLAAKS I KRH 
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KQTFLLGGGHDIAYAQYLATRKVYPESSIGVINIDAHFDTRDEGYSTSGTSFRQILEEDD 
NADYLVLGISQGGNTQALFNYAKEKDIQFVYADELLHQVSPPIKDMIERFIHNHDTVMFT 
ICMDWDSAFAPGVSAPAVLGIYPHTVFELAKRVIPSEKVKSISIAEMNPTYDSDQRTAK 

LVANLVHHCLI* 

5 

Sequence 3113 

Cont ig_0 6 3 4_pos_2 6 9 5_2 021 

>gp:gp|U40385|SEU40385_l Staphyloccous epidermidis plasmid 
pSK818 insertion sequence IS257(818B) putative transposase g 
10 ene, complete cds. NID: gl762099. >gp : gp | U40386 | SEU40386_1 S 
taphyloccous epidermidis plasmid pSK818 insertion sequence I 
S257(818C) putative transposase gene, complete cds. NID: gl7 
62101 . 

atgaactatttcagatataaacaatttaacaaggatgttatcactgtagcggttggctac 
15 tatctaagatatgcattgagttatcgtgatatatctgaaatattaagggaacgtggtgta 
aacgt tea teat tcaacggtctaccgttgggttcaagaatatgccccaattttatatcaa 
atttggaagaaaaagcataaaaaagcttattacaaatggcgtattgatgagacgtacatc 
aaaataaaaggaaaatggagttatttatatcgtgccattgatgcagagggacatacatta 
gatatttggttgcgtaagcaacgagataatcattcagcatatgcgtttatcaaacgtctc 
20 attaaacaatttggtaaacctcaaaaggtaattacagatcaggcaccttcaacgaaggta 
gcaatggctaaagtaattaaaggttttaaacttaaacctgactgtcattgtacatcgaaa 
tatctgaataacctcattgagcaagatcaccgtcatattaaagtaagaaagacaaggtat 
caaagtatcaatacagcaaagaatactttaaaaggtattgaatgtatttacgctctatat 
aaaaagaaccgcaggtctcttcagatctacggattttcgccatgccacgaaattagcatc 
25 atgctagcaagttaa 

Sequence 3114 

MNYFRYKQFNKDVITVAVGYYLRYALSYRDISEILRERGVNVHHSTVYRWVQEYAPILYQ 
IWKKKHKKAYYKWRIDETYIKIKGKWSYLYRAIDAEGHTLDIWLRKQRDNHSAYAFIKRL 
30 IKQFGKPQKVITDQAPSTKVAMAKVIKGFKLKPDCHCTSKYLNNLIEQDHRHIKVRKTRY 
QSINTAKNTLKGIECIYALYKKNRRSLQIYGFSPCHEISIMLAS* 

Sequence 3115 

Con t i g_0 6 3 5_pos_3 5 4 8_4 7 50 

35 is similar to (with p~value l.Oe-24) 

>gp -gp I U33059 1 APU33059_5 Actinosynnema pretiosum auranticum 
diaminopimelate decarboxylase (lysA) , 3 -amino-5-hydroxybenz 
oic acid synthase, oxidoreductase, phosphatase, and aminodeh 
ydroquinate synthase genes, complete cds; transcription acti 

40 vator gene, partial cds; and unknown genes. NID: g3 05 6 877. 

gtgtatcagtcagaaaatcaatcattactttttattgttattttaggttcattaacagca 
tttggcccattggctattgatatgtctttacctggactacctaatattagtcatgatttt 
gatatttctgcatctacaactcagcttactatctccttttttatgattggattagcgtta 
ggaaattttttggctggccccatatctgatattactggtagaaaaaaaccattaattttc 

45 tcactgattatttttactattgcgagtttaggtattatattcgtcacaaatatatggatt 
atgattattttacgatttattcaaggattaactggtggtgcaggtgcagtcatctcaaga 
gccattgctagtgatatgtactcaggtaatgcgctaactaaatttttatcattattaatg 
cttgtcaatggcattgcgccaattatcgcacctgcgcttggcggtatcattttaaattat 
gggccatggcgaattgtatttgtaatactaacaatgtttgggattgtcatgttaatagga 

50 actttatttaaagttcctgagtcgcttgaaaagagcctaagggaaagtagtaacataggt 
acgatgctaattaatttcaaagaactttttaaaacaccccgttttgtattacccatgttg 
atacaaggggtgtcatttgtattactatttacttatatttctgcatctccttttatagtt 
caaacaatttatggtttaacgccattaaacttcagtattatgtttgcttttataggcgtt 
acac teat tat ttcaagccaattaaccggaaaacttgttgactatatagatagattactg 

55 ttgctgagaatcatgtctactatacaagttattggtgttataatcgtatcactaacttta 
ctcaaccattggactttttggatactttettgtggctttgtgattttagtggcaccagtt 
acagggattgcaacgttaggattttcgatagcaatggatgagagtaaaggggccaaaggt 
agttcttcaagtttgttgggattgtttcaaactttacttggtggcgtcatctctccactt 
gttggtattaagggagacagtaatgcgataccttatataatcgttatcgttattacagca 
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ataattcttatggttttacagttgattaatgtgaagatatttaaaaaagctaaaattcat 
tga 

Sequence 3116 

5 VYQSENQSLLFIVILGSLTAFGPLAIDMFLPGLPNISHDFDISASTTQLTISFFMIGLAL 
GNFLAGPISDITGRKKPLIFSLIIFTIASLGIIFVTNIWIMIILRFIQGLTGGAGAVISR 
AIASDMYSGNALTKFLSLLMLVNGIAPIIAPALGGIILNYGPWRIVFVILTMFGIVMLIG 
TLFKVPESLEKSLRESSNIGTMLINFKELFKTPRFVLPMLIQGVSFVLLFTYISASPFIV 
QTIYGLTPLNFSIMFAFIGVTLIISSQLTGKLVDYIDRLLLLRIMSTIQVIGVIIVSLTL 
10 LNHWTFWILSCGFVILVAPVTGIATLGFSIAMDESKGAKGSSSSLLGLFQTLLGGVISPL 
VGIKGDSNAIPYIIVIVITAIILMVLQLINVKIFKKAKIH* 

Sequence 3117 
Cont ig_0 6 3 5_pos_l 1 7 0_2 1 4 
15 is similar to (with p-value 3.0e-40) 

>sp:sp|P28246|BCR_ECOLI BICYCLOMYCIN RESISTANCE PROTEIN (SU 
LFONAMIDE RESISTANCE PROTEIN) . 

atgttactttttttatgttttctaattgaattattacttattgttttattatatacgaag 
caatcgtttactcttaatttatttagttttatcttatataccatcattggttttgttatg 

20 atgacttatcatatggtaactgtatcaataccatatgatatgtttatcattgtaattgta 
gcaatgatactgttattgattaaatatcgttatattttcaagttgcaaacaggacgtttt 
tttattttacaacttagtcatcatttttatactgtggggctatttgctgtgagttgttta 
tatataagtactattcccctaattatcattaatagcttagctttatgggccgctatcatt 
gcatttagtacaatttattcatttatcggatacttatcttggtctacagcttttgaaaat 

25 catcaatattataaacacgtaaagttaattatggtgcttggagctggaattttctctgaa 
gaagtgactacgcttcttgctgcaagacttgataaagctttatctgtttatcattcacaa 
cggactaaacctatcatcattgtaagtggtggccaaggtcctgacgaaccaatttcagaa 
gcacttgcgatgaaaagatacct tatagctcacaacgttccggaaaaccatatatttatg 
gagaatcaatccacgaatacacgaaccaatttcttatactctaaatctatcattcattcg 

30 atgatgcctacttcaagtcagatgttgtgtgtaacaagtcaatttcatgttttaagagcg 
cttaaatttgctaaaaaagctcatctttctttcgatggtattggaagtcgtacaccatac 
cactttttggcacaatctatgattatagactttttgggtttaatgtatcaatataaaaca 
atacttactatttatttcgctatgttgttttggcttgcaatactacaaaccatataa 

35 Sequence 3118 - 

MLLFLCFLIELLLIVLLYTKQSFTLNLFSFILYTIIGFVMMTYHMVTVSIPYDMFIIVIV 
AMILLLIKYRYIFKLQTGRFFILQLSHHFYTVGLFAVSCLYISTIPLIIINSLALWAAII 
AFSTIYSFIGYLSWSTAFENHQYYKHVKLIMVLGAGIFSEEVTTLLAARLDKALSVYHSQ 
RTKPIIIVSGGQGPDEPISEALAMKRYLIAHNVPENHIFMENQSTNTRTNFLYSKSIIHS 

40 MMPTSSQMLCVTSQFHVLRALKFAKKAHLSFDGIGSRTPYHFLAQSMIIDFLGLMYQYKT 
ILTI YFAMLFWLAILQTI * 

Sequence 3119 
Contig_0637_pos_1621_27 09 
45 is similar to (with p-value 9.0e-27) 

>gp:gp|D45211 |ARG0D_1 Arthrobacter sp. gene for opine dehyd 
rogenase, complete cds. NID: gl060847. 

gtgaaaagaggtaaaaaaatgaaaatagctattgtaggttcaggtaatggtgcagtaact 
gctgcagtggatatggtagataaaggtcatgatgtacgattatattgtcgtaacgaatct 

50 attagtaaatttgatgtcgccctagaaaaaggtggctttgattttaataatgagggagaa 
gagaagtttatagagtttactgatattagtgatgatatggagtatgttttagatggtgca 
gacattgttcaggtaattattcct tea teat teat tgaatattatgctaaagtgatgtca 
aaatttgtgacgaacgaccatctcattttctttaacattgctgcttcaatgggttcaata 
cgatttatgaatgtattagaagatcgtcatattgatgtccatccacactttgcagaagca 

55 aatacattaacatatggtacacgtgttgactttaacaatgctaaagtagatttatcttta 
aatgttcgtcgggtgttcttttcaacatttgatcgtagtgagttaaatgaaagttatgaa 
aaggtatctaaaatttacgattatcttgtaaaagaagaaagtttacttaaaactaatctt 
gaaaatggtaacccagaagtacatcctggaccaacattattgaacgttggacgtattgat 
tattcagaagagttttctttatataaagaaggcataacaaaacatactgtgagattatta 
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ca tgc t a t tgaga tagaacg 1 1 1 aaa 1 1 1 agggagaaaa 1 1 agg 1 1 1 tgaa 1 1 a tcgac t 
gccaaagaatcacgtattcaaaggggttatttagaacggaaagacgaggatgaaccgtta 
aatcgtctttttaatactagtcctgtgttttctcaaattccaggaccgaatcacgttgaa 
aaccgttatttaactgaagatatcgcatatggattagtattatggtctagtttaggacgt 
5 gtcattgatgtcccgacacctaatatcgatgctgttattatgatagcttcaactattctt 
gaacgcgatttctttgaagagggcctcactatcgaggaattaggcttagataaattagga 
ttagagtaa 

Sequence 3120 

10 VKRGKKMKIAIVGSGNGAVTAAVDMVDKGHDVRLYCRNESISKFDVALEKGGFDFNNEGE 
EKFI EFTDISDDMEYVLDGADI VQVI I PSSFIEYYAKVMSKFVTNDHLIFFNIAASMGSI 
RFMNVLEDRHIDVHPHFAEANTLTYGTRVDFNNAKVDLSLNVRRVFFSTFDRSELNESYE 
KVSKIYDYLVKEESLLKTNLENGNPEVHPGPTLLNVGRIDYSEEFSLYKEGITKHTVRLL 
HAIEIERLNLGRKLGFELSTAKESRIQRGYLERKDEDEPLNRLFNTSPVFSQIPGPNHVE 

15 NRYLTEDIAYGLVLWSSLGRVIDVPTPNIDAVIMIASTILERDFFEEGLTIEELGLDKLG 
LE* 

Sequence 3121 

Cent ig_0 6 4 0_pos_7 0 3 4_6 588 

20 is similar to (with p-value 9.0e-16) 

>gp:gp j AB012285 |AB012285_1 Photobacterium damsela gene for 
sialyltransf erase 0160, complete cds. NID: g2988378. 
atgctaattacgagacaacaacctattgccaaagatttacgtatgatgatggcagcactc 
aaaatctctactgactttgaacgaatgggggataatgctgctagtatcgctcatatacgt 

25 ttaagagttaaaataaatgataactatgtgtttacacgtttaaaaaccatgggtaaatta 
gcgatgctcatgttagaagatttaaataacgctattagaaataaagatttaccactgata 
aaagaagtcattgagagagatgaagatattgatgatttatacgttaacatcgtcaatacc 
agttacttaattgataatgacccattcgtagctggtcaagcacacttagcagctagacac 
ttagaacgaataggtgatcatataagcaatattgctgaaagtgtttattattatttaaca 

30 ggccaacattttgaaacttttgattaa 

Sequence 3122 

ML ITRQQPI AKDLRM^IMAALKI STDFERMGDNAAS I AHIRLRVKINDNYVFTRLKTMGKL 
AMLMLEDLNNAIRNKDLPLIKEVIERDEDIDDLYVNIVNTSYLIDNDPFVAGQAHLAARH 
35 LERIGDHISNIAESVYYYLTGQHFETFD* 

Sequence 3123 

Contig_0642_pos_5096_4452 

is similar to (with p-value 7.0e-22) 

40 >pir:pir|S56598|S56598 yjjG protein - Escherichia coli >gp: 
gp|ui4003 |eC0UW93_286 Escherichia coli K-12 chromosomal regi 
on from 92.8 to 00.1 minutes. NID: gl263172. >gp : gp | AE000507 
1AE000507_13 Escherichia coli K-12 MG1655 section 397 of 400 
of the complete genome. NID: g2 3 67 3 80. 

45 g tgga 1 1 1 1 ta tgacgc tgagaaaaaagcgt 1 1 tataa t ttagcgcagaaa tacaa teat 
cagccaactcaacaggatttcgaacattttaagaaagtgaaccaagcgcattgggaagca 
tttcaacaaaataaattgactaaagatgaagtgctgtcacaacgatttattaattatttt 
aatgactatcaaattcatgtaaatggaaaagaagctgatgagtgctttagagctgaatta 
gcaaaggcaccagttaaattatttgatcatacattagaagttatacaacaattaaaatta 

50 aa teat tctctctatatagtaaccaatggtgtaacagaaacacagctacgacgaat tgc t 
cagacacaatttaatgaaatatttcaagatgtctttatatctgaacaagctggatttcaa 
aagtcgatgacagagttcttcgattttgtgtttgaacatatcggagagaataacaggaat 
caaactctaattgtgggagattctttaacgtctgacattttaggtggtaaaaatgctaat 
atatcaacatgttggtttaatattagacaaaaagaaaaccatacgtctattcaaccggat 

55 tatatcattaatgatttatcagaaatgattcgcattgttgagtga 

Sequence 3124 

VDFYDAEKKAFYNLAQKYNHQPTQQDFEHFKKVNQAHWEAFQQNKLTKDEVLSQRFINYF 
NDYQIHVNGKEADECFRAELAKAPVKLFDHTLEVIQQLKLNHSLYIVTNGVTETQLRRIA 
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QTQFNEIFQDVFISEQAGFQKSMTEFFDFVFEHIGENNRNQTLIVGDSLTSDILGGKNAN 
ISTCWFNIRQKENHTSIQPDYIINDLSEMIRIVE* 

Sequence 3125 
5 Contig„0645_pos_2339_3055 

is similar to (with p-value l,0e-45) 
>gp:gp| AB014075 I AB014075_8 Clostridium histolyticum genes f 

or hypoxanthine-guanine phosphoribosyl- transferase (HGPRTase 

), GTPase and 12 ORFs, complete and partial cds. NID: g38688 
10 63. 

atgggacgtaaatggaacaacattaaagagaaaaaagcccaaaaagataaaaatactagt 
agaatatatgccaaatttggtaaagaaatatatgtagctgcaaagtctggtgagcctaat 
ccagagtcaaatcaaactttaagattagtattagaacgtgcaaaaacatattcagtacct 
aatcatattatagatagagctattgataaggctaaaggcgctggtgacgaaaactacgat 

15 cacttaagatatgaaggttttggtccgaatggttcaatgcttatagttgacgcattaaca 
aacaatgtaaatcgtacagcatcagatgtacgtgctgcgttcggtaagaatggaggaaat 
atgggagtatctggttcagtagcttatatgtttgaccatactgcaacctttggtgtagaa 
ggtaaatctgtagatgaagtcttagaaacactaatggagcaagatattgatgtaagagat 
gtaattgatgacaatggcttgactattgtttacgcagaaccagatcaatttgcacaagtt 

20 caggatgcattacgtgaagctggcgttgaggaatttaaagtagcagagtttgaaatgtta 
cctcaaactgatattgagttgtctgaagaggatcaagctatttttgaaaaattaatcgat 
gcacttgaagacttggaagatgttcaaaatgttttccataatgtagatttaaaataa 

Sequence 3126 

25 MGRKWNNIKEKKAQKDKNTSRIYAKFGKEIYVAAKSGEPNPESNQTLRLVLERAKTYSVP 
NHIIDRAIDKAKGAGDENYDHLRYEGFGPNGSMLIVDALT^FNVNRTASDVRAAFGKNGGN 
MGVSGSVAYMFDHTATFGVEGKSVDEVLETLMEQDIDVRDVXDDNGLTIVYAEPDQFAQV 
QDALREAGVEEFKVAEFEMLPQTDIELSEEDQAIFEKLIDALEDLEDVQNVFHNVDLK* 

30 Sequence 3127 

Con t ig_0 6 4 5_pos_4 7 2 2_5 774 

is similar to (with p-value l,0e-57) 

>sp:sp|P39676|FHP_YEAST FLAVOHEMO PROTEIN (HAEMOGLOBIN- LIKE 
PROTEIN) (FLAVOHEMOGLOBIN) . 

35 atgtttgaggctaacccagaattattaaacatgtttaaccaaacaaaccaaaagaaaggt 
atgcaatctgctgcattagcacaagcagtactagctgcagcaatgaatattaataattta 
ggtgcaattaagccagcaattatgcctgtggcacataagcactgtgctttacaagtttat 
cccgaacattatccaattgtaggtgaaaatttacttgctgcaattcaagatgtgacaggg 
ttagaaagtgacgacccagtaattcaaacatgggcaaaagcgtatggagaaattgcagat 

40 gtatttatcaaattagagcaagaaatttacaaccatatgttatggaaaggttttaaacca 
tttaaaatcacaaacattacacaagaaacaagtgacatcaaatctttcacagttgaatct 
gaagaatatgatttaagtcaattcgaaccaggtcaatacattaccgtagatgtttctagt 
gaaaagttaccatatagagctaaacgtcattattcaatcatagatggagatgaaaatcac 
ttagtatttggtgtcaaacgtgatgtgactactgaacatgaaggtgaagtttcaacaatt 

45 ttacatgatgaaatatcagaaggtgacatgattaatttatctgctcctgtaggtggcttt 
tcaatagaaaatactgaaagaccgcaattgtttattggttctggcgtaggtatgacacca 
ttagtttcaatgtttaaaaaagctgcatcattaaacgttccaactcaaatgattcaagcg 
gttgtgacagaagatgaacgaccatttgctcaaaaacttgatagcattacagataattat 
gagcaagcacagctacatttacacgtgaaagataaagaaggttatttagaagctaaagaa 

50 ttagaacaatatttaagtgaacagcctgaaatttatatttgtggtggtacgaaattctta 
cattcaattataaattctcttaaagaattaaattatgatatgaatcatgttcattttgaa 
acatttattcctcgtttaagtgttcaagtatag 

Sequence 3128 

55 MFEANPELLNMFNQTNQKKGMQSAALAQAVLAAAMNINNLGAIKPAIMPVAHKHCALQVY 
PEHYPIVGENLLAAIQDVTGLESDDPVIQTWAKAYGEIADVFIKLEQEIYNHMLWKGFKP 
FKITNITQETSDIKSFTVESEEYDLSQFEPGQYITVDVSSEKLPYRAKRHYSIIDGDENH 
LVFGVKRDVTTEHEGEVSTILHDEISEGDMINLSAPVGGFSIENTERPQLFIGSGVGMTP 
LVSMFKKAASLNVPTQMIQAWTEDERPFAQKLDSITDNYEQAQLHLHVKDKEGYLEAKE 
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LEQYLSEQPEIYICGGTKFLHSIINSLKELNYDMNHVHFETFIPRLSVQV* 

Sequence 3129 
Contig_0650_pos_7844_8518 

5 is similar to (with p-value l.Oe-98) 

>gp:gp|U50335 |MSU50335_2 Mycobacterium smegmatis phage resi 
stance (mpr) gene, complete cds . NID: gl477566. 
atgaactatttcagatataaacaatttaacaaggatgttatcactgtagcggttggctac 
tatctaagatatgcattgagttatcgtgatatatctgaaatattaagggaacgtggtgta 

10 aacgttcatcattcaacggtctaccgttgggttcaagaatatgccccaattttatatcaa 
atttggaagaaaaagcataaaaaagcttattacaaatggcgtattgatgagacgtacatc 
aaaataaaaggaaaatggagctatttatatcgtgccattgatgcagagggacatacatta 
gatatttggttgcgtaagcaacgagataatcattcagcatatgcgtttattaaacgtctc 
attaaacaatttggtaaacctcaaaaggtaattacagatcaggcaccttcaacgaaggta 

15 gcaatggctaaagtaattaaagcttttaaacttaaacctgactgccattgtacatcgaaa 
tatctgaataacctcattgagcaagatcaccgtcatattaaagtaagaaagacaaggtat 
caaagtatcaatacagcaaagaatactttaaaaggtattgaatgtatttacgctctatat 
aaaaagaaccgcaggtctcttcagatctacggattttcgccatgccacgaaattagcatc 
atgctagcaagttaa- 

20 

Sequence 3130 

MNYFRYKQFNKDVITVAVGYYLRYALSYRDISEILRERGVNVHHSTVYRWVQEYAPILYQ 
IWKKKHKKAYYKWRIDETYIKIKGKWSYLYRAIDAEGHTLDIWLRKQRDNHSAYAFIKRL 
IKQFGKPQKVITDQAPSTKVAMAKVIKAFKLKPDCHCTSKYLNNLIEQDHRHIKVRKTRY 
25 QSINTAKNTLKGIECIYALYKKNRRSLQIYGFSPCHEISIMLAS* 

Sequence 3131 
Contig_0650_pos_6596_5145 
>gp:gp|U403 85 I SEU403 85_1 Staphyloccous epidermidis plasraid 
30 pSK818 insertion sequence IS257(818B) putative transposase g 
ene, complete cds. NID: gl762099. >gp:gp | U40386 | SEU40386_1 S 
taphyloccous epidermidis plasmid pSK818 insertion sequence I 
S257(818C) putative transposase gene, complete cds, NID: gl7 
62101. 

35 atggttgaaaaattaaaacatgaatggtttaaccagccaggtaaaaatatacttgccggt 
atcgtggttgccttagctttaatccctgaagctatcgcattttcaattattgctggcgta 
gatccaatggttggtttgtatgcttcatttatcatcgctgttgttactgctattgttggt 
ggtagacctgcaatgatatcaggtgcaacaggggctgttgccttattagttacaccactt 
gtgaaagattatggtgtagaatatcttttagctgccacgatattaatgggagtaattcaa 

40 ttagttttaggccttctcaaagtggggcgtttaatgaaatttatacctcattccgtcatg 
ataggttttgtaaatgcattaggtattatgattttcatgtcccaaatagaacatatcttc 
ggtatttcaatctcaacttatatatatgtaattgtaacattactcattgtatatataatc 
cctaaatttttcaaagcaatacctgcaccattaatagctatcatcgtattgactgctctt 
tatatgtatacaggttctgacgtgagaactgttggtgatttaggaaatattaagcaagct 

45 ttaccgcactttttaattcctaatgttccctttaatttagaaacacttcaaatcattttt 
ccatactcgctatctatggctattgtaggtctagtagaaagtttacttactgctaaaatt 
gtagatgatgcaacagacacttatagtagtaaaaacagagaatctcgtggccaaggcatt 
gctaatatgattacaggattattcggtggtatgggaggttgtgccatgattggacaatct 
gtaatcaatgtcaaatcaggtgcaaacagtagattatctactttttctgctggtgttgtc 

50 ttaatattcatgattattgttcttggaggacttgttgttcaaattccaatgccaatttta 
gcaggtattatggttatggtttcgattggtacatttgattggaattcttttaaatatatt 
caaaaagcaccaaaaacagatgcagttgttatgatacttacagtgataattgtactgatg 
acacataacttagctctcggcgtggtcgtaggtgttattttcagtgctttattctttgct 
actaaaatatcaaaagtagaagtaacatctgagaagtttggtaaaactaaccgtttatct 

55 tttaaaggtcaaatcttttttgtttctattgactctatgatggatcaaattagctttaat 
attgaaaatagtattatagaattaaactttaataatgctcatttatgggatgattcagca 
gtagatgctattgatacaatggtaaggaagttcgaagaaaaaaataacattgttcatgta 
gaaaaactaaattcagatagtcgtaaaatagtctcagaattaagcaaactaaatgaaaat 
catttaaactaa 
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Sequence 3132 

MVEKLKHEWFNQPGKNILAGIWALALIPEAIAFSIIAGVDPMVGLYASFIIAWTAIVG 
GRPAMISGATGAVALLVTPLVKDYGVEYLLAATILMGVIQLVLGLLKVGRLMKFIPHSVM 
5 IGFVNALGIMIFMSQIEHIFGISISTYIYVIVTLLIVYIIPKFFKAIPAPLIAIIVLTAL 
YMYTGSDVRTVGDLGNIKQALPHFLIPNVPFNLETLQIIFPYSLSMAIVGLVESLLTAKI 
VDDATDTYSSKNRESRGQGIANMITGLFGGMGGCAMIGQSVINVKSGANSRLSTFSAGW 
LIFMIIVLGGLWQIPMPILAGIMVMVSIGTFDWNSFKYIQKAPKTDAVVMILTVIIVLM 
THNLALGWVGVIFSALFFATKISKVEVTSEKFGKTNRLSFKGQIFFVSIDSMMDQISFN 
10 lENSIIELNFNNAHLWDDSAVDAIDTMVRKFEEKNNIVHVEKLNSDSRKIVSELSKLNEN 
HLN* 

Sequence 3133 

Con t i g_0 6 5 l_pos_3 2 8 8_1 67 2 

15 is similar to (with p-value 5.0e-34) 

>sp:sp|P3167 9 |YAAU_ECOLI HYPOTHETICAL METABOLITE TRANSPORT 
PROTEIN IN CARB-KEFC INTERGENIC REGION (ORF65/66) . >gp:gp|AE 
000114 1 AE000114_12 Escherichia coli K-12 MG1655 section 4 of 
400 of the complete genome. NID: gl786217. 

20 atgaaaataagggatgaaaatatggatttcgttaaatcaaaaactgacttatttagactc 
atagacaatgaagcgcaaacatcgacatctaagatggttttattcttaatattaggaact 
atatttttagatgcatatgatattactattttaggtacaatgactgatcaactcactcaa 
cagtttcacttatcaccatcaacgctatctatagtaatgacctctttacctattggtgca 
ttatttggtgcattacttggtggtacattagcacatcagtttggacgcaagcatatttta 

25 tcaattgccttactaacactcactgtaacctctcttggtgcggcactcgcaccaaatgta 
attattctaataatatgtcgttgtataatggggtttgctattggaatggatagtccagtt 
gctttcacttttattgcggaaataagtaatttaaagcacaaaggaagaaatgttaactat 
tggcaagtcgtttggtatgttgcaatagttacttctgctttagtggtcattgcgttcttt 
atgctaggggctggtgcacatttgtggagatatgcaattggatttggtgcacttattgct 

30 tttgtcttgtacatcttgcgaattaaatatttacacgaaagtccgacatgggtgattaat 
cattattctttagaaaaagcaactgaatttgtaagaaaatattatcataaagacatccac 
c tagagggaacgc t tgaaga tga 1 1 taagt t c tga tgtgac t tcgccacataat tc t tgg 
acagac 1 1 a 1 1 1 aaaccgaga t a ta taaaaagaa t ta t cc t ggcgac tgcga 1 1 tcaaca 
ttacaaggtatgcagtactatggtgtcggtttatacatacctattattgcaacttatctt 

35 attagtaaggataaaataggtgtattattaggtactgctatagtcaatatagcaggtatt 
ctaggcgcatatttaggtgctcaattgacttataaattaggtacacgcaagcttacaatg 
ataggcttcacacttgtattactttcaatggtatgtgtaggactcttttatcatcatcta 
ccaatgcttcttaacactttccttattggattatttttatttggccattcaggaggtcct 
ggtactcaaggaaaaacaattggtgcc t ta teat tcccgac tea tttacgttcacaagct 

40 actggctttgtagaatctgtaagtcgtactggtagtatcataggtacttttgtctttcca 
atcattcttgctgcagtaggtctaacgaatactatgttaatcttgtccattgtcccttta 
ctcggaattatcataacagtatctataaaatgggaagctgtcggtaaggaatacattgtt 
gaatataacgctactttggcattaaacgatatagaaagatcgataattagaaaagaatta 
acattagcttttaaaagaagcgaagtcaaactcagtcgaatggaaagacgtatcattcga 

45 ttactacttaatgattacaagccaaaggagattgctatggttttaaatttggaatccaaa 
gttgtttataatgcgattcaacgtagtaaatgtaaacttaaaagaagttttgaataa 

Sequence 3134 

MKIRDENMDFVKSKTDLFRLIDNEAQTSTSKMVLFLILGTIFLDAYDITILGTMTDQLTQ 
50 QFHLSPSTLSIVMTSLPIGALFGALLGGTLAHQFGRKHTLSIALLTLTVTSLGAALAPNV 
IILIICRCIMGFAIG^roSPVAFTFIAEISNLKHKGRNVXm^QVVWYVAIVTSALVVIAFF 
MLGAGAHLWRYAIGFGALIAFVLYILRIKYLHESPTWVINHYSLEKATEFVRKYYHKDIH 
LEGTLEDDLSSDVTSPHNSWTDLFKPRYIKRIILATAISTLQGMQYYGVGLYIPIIATYL 
ISKDKIGVLLGTAIVNIAGILGAYLGAQLTYKLGTRKLTMIGFTLVLLSMVCVGLFYHHL 
55 PMLLNTFLIGLFLFGHSGGPGTQGKTIGALSFPTHLRSQATGFVESVSRTGSIIGTFVFP 
IILAAVGLTNTMLILSIVPLLGIIITVSIKWEAVGKEYIVEYNATLALNDIERSIIRKEL 
TLAFKRSEVKLSRMERRIIRLLLNDYKPKEIAMVLNLESKWYNAIQRSKCKLKRSFE* 

Sequence 3135 
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Contig_0652_pos_133 5_2843 
>sp:sp|069282|MQO_CORGL MALATE : QUINONE OXIDOREDUCTASE (EC 1 
.1.99.16) (MALATE DEHYDROGENASE (ACCEPTOR) ) (MQO) . 
gtgagcaaaaaaatggctaataaagagtcaaaaaatgttgttattattggcgctggtgtc 

5 ttaagtacgacatttggttctatgattaaagaattagaacctgattggaacatcaaactc 
tatgaacgcttagatcgtccaggtattgaaagttctaacgaaagaaacaatgccggtaca 
ggacatgcggcgttatgtgaattgaactatacagtacaacaacctgatggttcaattgat 
atagaaaaagccaaagaaatcaacgaacaattcgagatttcaaaacaattctggggtcac 
ttagtaaaaagtggtaacatcagtaaccctagagatt teat taatccact tec tcacatt 

10 agtttcgtaagaggtaaaaataaegttaaattcttaaaaaaccgttacgaageaatgcgt 
aacttccctatgttcgataacatcgaatatacagaagatatcgaagaaatgagaaaatgg 
atgccattaatgatgacaggccgtactggtaacgaaatcatggcggctagtaaaatcgac 
gaaggtacagatgttaactacggtgaattaactcgtaaaatggcaaaaagtattgaaaaa 
catccaaatgctgatgttcaatacaaccacgaagtaattaatttcaatcgtcgtaaagac 

15 ggtatttgggaagttaaagttaaaaaccgtaattctggagacgttgaaactgttctagct 
gattatgtatttatcggtgcaggcggtggcgctattccactattacaaaaaactggtatc 
ccagaaagtaaacatcttggtggattccctatcagtggtcagttcttaatttgtacaaac 
cctgatgtaattaatgaacatgacgtcaaagtatatggtaaagaaccaccaggcacacct 
ccaatgactgtaccacatttagatacacgttatatcgatggtgaaagaacattattattt 

20 ggaccatttgcaaatattggccctaaattcttaagaaacggttctaacttagacttattc 
aaatcagttaaaccttataacatcacaacattactagcatctgcagttaaaaacttacct 
ttaatcaaatactctatcgaccaagtattaatgactaaagaaggttgtatgaaccatcta 
cgcacgttctaccctgaagctegtgacgaagattggcaattatacaetgcaggtaaacgt 
gttcaagttatcaaagatactaaagaacacggtaaaggattcattcaatttggtaeagaa 

25 gttgttaactctaaagaccactctgttatcgcactattgggtgaatcacctggagcatca 
acttcagtatcagtagccctagaagttttagagaaaaactttgctgagtatgaaaaagat 
Cggactccaaaattacaaaaaatgatcccatcatatggtaaatctcttatcgatgatgtt 
aagttaatgagageaaetcgtaaacaaaeatctaaagatttagaattaaattattacgaa 
tctaaataa 

30 

Sequence 3136 

VSKKMANKESKNWIIGAGVLSTTFGSMIKELEPDWNIKLYERLDRPGIESSNERNNAGT 
GHAALCELNYTVQQPDGSIDIEKAKEINEQFEISKQFWGHLVKSGNISNPRDFINPLPHI 
SFVRGKNNVKFLKNRYEAMRNFPMFDNIEYTEDIEEMRKWMPLMMTGRTGNEIMAASKID 

35 EGTDVNYGELTRKMAKSIEKHPNADVQYNHEVINFNRRKDGIWEVKVKNRNSGDVETVLA 
DYVFIGAGGGAIPLLQKTGIPESKHLGGFPISGQFLICTNPDVINEHDVKVYGKEPPGTP 
PMTVPHLDTRYIDGERTLLFGPFANIGPKFLRNGSNLDLFKSVKPYNITTLLASAVKNLP 
LIKYSIDQVLMTKEGCMNHLRTFYPEARDEDWQLYTAGKRVQVIKDTKEHGKGFIQFGTE 
WNSKDHSVIALLGESPGASTSVSVALEVLEKNFAEYEKDWTPKLQKMIPSYGKSLIDDV 

40 KLMRATRKQTSKDLELNYYESK* 

Sequence 3137 

Contig_0652_pos_73 88_7999 

is similar to (with p-value 9.0e-32) 

45 >sp:sp|P15029|FECD_ECOLI IRON(III) DICITRATE TRANSPORT SYST 
EM PERMEASE PROTEIN FECD. >pir :pir | S56513 | S56513 citrate-dep 
endent iron transport protein fecD - Escherichia coli >gp:gp 
|U14003 I ECOUW93_200 Escherichia coli K-12 chromosomal region 
from 92.8 to 00.1 minutes. NID: gl263172. >gp: gp | AE000499 | A 

50 E000499_8 Escherichia coli K-12 MG1655 section 389 of 400 of 
the complete genome. NID: gl790732. 
atgatcataatgatatttccatcagcacctctatttgttcttcctttaggttcatttatc 
ggtgctttgacaataagtattattctttcagttcttatttcaaaatttgatgtaaaggga 
tcaaaattagcattgataggtttagcgataggtgcaatttgtacggccattgtecaattc 

55 ttgcttatacgtaatcctcttgatgcaaataatgcgttattatggttgactggtagttta 
tacggtcataatatagtcaatttttatagtttattaccatggtttattatcactgtacct 
atagtattgttattagggtatcaacttgatattttaaatttaggtgatcatgtagccatt 
gcaetaggagcacgtgtaaaaatcttaaaaatgattttacttgtattagcagtaatgtta 
gcaggtgcttccattgcggtagtagggggtattagttttttaggtcttatagcacctcat 
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attgcacgtcaacttgtcggccataaaaatatacatgttataatcatgtcaggtttggta 
ggagcaatattattaacttttggtgatggtttagcaagaggtatacaacctcctcttgat 
attcctgtatga 

5 Sequence 3138 

MIIMIFPSAPLFVLPLGSFIGALTISIILSVLISKFDVKGSKLALIGLAIGAICTAIVQF 
LLIRNPLDANNALLWLTGSLYGHNIVNFYSLLPWFIITVPIVLLLGYQLDILNLGDHVAI 
ALGARVKILKMILLVLAVMLAGASIAWGGISFLGLIAPHIARQLVGHKNIHVIIMSGLV 
GAILLTFGDGLARGIQPPLDIPV* 

10 

Sequence 3139 
Con t i g_0 5 5 3 _^os_2 8 2 4_2 252 
is similar to (with p-value 6.0e-31) 
>sp:sp|P14776|DHSS_SYNPl SOLUBLE HYDROGENASE. SMALL SUBUNIT 
15 (EC 1,12.-.-) (TRITIUM EXCHANGE SUBUNIT). >pir :pir | S06919 | H 

QYCSS soluble hydrogenase (EC 1.12.-.-) small chain - Synech 
ococcus sp. (PCC 6716) >gp : gp | X16658 | SYNS0LHY_1 Synechococcu 
s DNA for the small subunit of soluble hydrogenase. NID: g48 
053. 

20 atgctaccaccaggtctagcatttgttgcttatagcgatagagcaaaaaaacgatttgct 
gatgtaaaaacaccgagattctatttagatttaaataaatacataaaatcacaagagcaa 
aattcaacgcccttcacccctaatgttggtctatttagaggaataaatgcttatgtagaa 
cttgtaaaaaaagaaggattaaatcacgttatttcacgccattttaaaatacgtaatgcc 
ttaagagcagcactaaaggcacttgaattagaattattagtaaaagatgatgctcatgcc 

25 tcacctactgtgacctcatttgttccaaaaaatcaagaagaacttaatatcattaaaaat 
caacttaaatctcaattcaatataactattgctgggggacaaggacacttaaaaggacaa 
attttgagaattggtcacatggggaaaatatctccttttgatattttagcagtcgtgtct 
gcattggaaattattttaacttctaatagaaatgtcaattatattggaacagggataact 
caatttatggaggttattagacatgagtcataa 

30 

Sequence 3140 

MLPPGLAPVAYSDRAKKRFADVKTPRFYLDLNKYIKSQEQNSTPFTPNVGLFRGINAYVE 
LVKKEGLNHVISRHFKIRNALRAALKALELELLVKDDAHASPTVTSFVPKNQEELNIIKN 
QLKSQFNITIAGGQGHLKGQILRIGHMGKISPFDILAWSALEIILTSNRNVNYIGTGIT 
35 QFMEVIRHES* 

Sequence 3141 
Cont ig_0 6 57_pos_4 6 6 8_5093 
is similar to (with p-value 2.0e-19) 
40 >gp:gp| AF001974 |AF001974_1 Thermoanaerobacter ethanolicus p 
utative TrkG gene, partial cds, and putative TrkA, xylose is 
omerase (xylA) and xylulose kinase (xylB) genes, complete cd 
s. NID: g2581794. 

gtgagaccgactgtaccgaatgctgatacgacttcgaagagtattttgattaaaggtata 
45 ttggaattgattatggtaagtataaacgtaaccataccgataaatgcaatagaaatgaga 
atagttacgaaagacaactgtatgtatctttctgatatttctctattaaatatagagttg 
tttttttctttgcgtatcgtattaaaaatagcgattgttgcaataacaaaagttgtaacc 
tttataccacctgcagcactcaatggtgcacctccaataaacatgagagccataagtaat 
aaagctgtcggtgttttaatgtttccaacgtcaattgtgttaaatcctgcagtccttgtt 
50 gtcactgattggaaaaatgcatttcctattttttcaattaatcccatgtgtaacatagag 
ttttga 

Sequence 3142 

VRPTVPNADTTSKSILIKGILELIMVSINVTIPINAIEMRIVTKDNCMYLSDISLLNIEL 
55 FFSLRIVLKIAIVAITKWTFIPPAALNGAPPINMRAISNKAVGVLMFPTSIVLNPAVLV 
VTDWKNAFPIFSINPMCNIEF* 

Sequence 3143 

Con t ig_0 6 6 0_pos_3 0 1 3_3 495 
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is similar to (with p-value 3.0e-18) 

>gp:gp|D89592 |D89592_3 Vibrio alginolyticus rhlE, KtrA and 
KtrB genes, complete cds. NID: g3927863. 

atgaatgaaataagtcctacacgacctataaacattaatatcattaatacagccttggta 
5 atgtcgttgacatcgcttgtcacacctaagcttaaaccacatgtaccaaacgcagacatt 
acttcaaagaaaatttgtaaaaaggacaacttgccttgttcagtggcagatataataatc 
atgctaataaatgtaattaatgacgccatagtaaatacggcaaatgatctttgtacaccc 
ataatatgtacttctctgttgaaaattttaatacctgttttgtcaccagtattattaaaa 
ttaataacaaacaaaattaaaatagcaaatgttgttgttctgattccacctcctacagaa 
10 ctgggagatgatcctataaacatcaataatcccattacaatatttgttgcgtcgctgaaa 
tgtgacacatctatcgtttgcaaacctgcacttctggtcgttgatgattggaacaatgca 
taa 

Sequence 3144 

15 MNEISPTRPININIINTALVMSLTSLVTPKLKPHVPNADITSKKICKKDNLPCSVADIII 
MLINVINDAIVNTANDLCTSIICTSLLKILIPVLSPVLLKLITNKIKIANVWLIPPPTE 
LGDDPININNPITIFVASLKCIXrSIVCKPALLWDDWNNA* 

Sequence 3145 
20 Con t ig_0 6 6 0_pos_9 8 8_4 7 

is similar to (with p-value 3.0e-76) 
>sp:sp|Q46807 |ARCL_ECOLI CARBAMATE KINASE-LIKE PROTEIN 1, > 

gp:gp|U28375|ECU283.75_24 Escherichia coli K-12 genome; appro 

ximately 64 to 65 minutes. NID: g887800. >gp:gp | AE000370 | AEO 
25 0037Q_9 Escherichia coli K-12 MG1655 section 260 of 400 of t 

he complete genome. NID: g2367170. 

gtgagtgaaatggctaaaattgtagtagctttaggtggaaacgctttaggaaaatcacca 
caagaacaacttgaattagtaaaaaatacagctaaatccctagtaggat taa t tact aaa 
ggtcacgaaattgtgattagtcacggtaatggaccacaagtaggaagtattaaccttggt 

30 ctgaattatgcagctgaacacgatcaaggtcctgcttttccatttgctgaatgtggcgct 
atgagtcaagcctacatcggctatcaacttcaagaaagtttacaaaatgaact teat tea 
atgggcatagataagcaagttgtcacactagttacccaagtagaagttgatgaaggcgat 
ccagcttttaatagtccaagtaaacccatcggtctgttctacactaaagaagaagcaaat 
cgtattcaacaggaaaaaggttatcaatttgtagaagatgctggtcgaggttaccgtcgc 

35 gttgtaccatcaccacaaccaatatctattatcgaactggaaagtattaaaactctagta 
gaaaatgacacactcgtcatcgctgcaggtggaggtggtataccagtcattcgcgaacag 
catgatagctttaaaggtatagatgccgtcatcgataaagacaaaacaagtgcattatta 
ggtgctgatattcactgtgatcaactcattattttaacagcgattgattatgtttatatc 
aactatcatactgaccaacaacaagcacttaaaacaacaaatatagatacgcttaaaaca 

40 tatattgaagaagaacaatttgccaaaggcagcatgctacctaaaatcgaatctgccatc 
tcctttattgaaaataatcctaacggtagcgtgctcatcacatcattaaatcaattagat 
gcagcactagaaggtaaaattggcacactcattacaaagtaa 

Sequence 3146 

45 VSEMAKIWALGGNALGKSPQEQLELVKNTAKSLVGLITKGHEIVISHGNGPQVGSINLG 
LNYAAEHDQGPAPPFAECGAMSQAYIGYQLQESLQNELHSMGIDKQWTLVTQVEVDEGD 
PAFNSPSKPIGLFYTKEEANRIQQEKGYQFVEDAGRGYRRWPS PQPI S 1 1 ELESI KTLV 
ENDTLVIAAGGGGIPVIREQHDSFKGIDAVIDKDKTSALLGADIHCDQLIILTAIDYVYI 
NYHTDQQCALKTTNIDTIiKTYIEEEQFAKGSMLPKIESAISFIENNPNGSVt.ITSLNQLD 

50 AALEGKIGTLITK* 

Sequence 3147 
Cont ig„0 6 6 l_pos_4 5 9 2_5 317 
is similar to (with p-value 5.0e-26) 
55 >sp:sp|P23553|XYNC_CALSA ACETYL ESTERASE (EC 3.1.-.-). >pir 
:pir|B37202|B37202 acetylesterase (EC 3.1.1.6) (XynC) - Cald 
ocellum saccharolyticum >gp:gp |AF005383 |AF005383_9 Caldicell 
ulosiruptor saccharolyticus putative transport protein (XynG 
) , putative transport protein (XynH) , xylanase (XynF) , xylan 
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ase (XynE) , xylsmase (XynD) , xylanase (XynA) , acetylxylosida 
se (XynC) and xylanase (XynB) genes, complete cds. NID: g264 
5411. >gp:gp|M34459 |CDCXYNAB_2 C . saccharolyticum xylanase A 
<XynA) , beta-xylosidase (XynB) and acetyl esterase (XynC) ge 
5 nes. complete cds. NID: gl44295. 

atgcatcaaacgattaatgtcattcttccagaagataaaagttactttgatacaaatgaa 
aatgcgaaaccattaaaaactatgt tat tgctacatggtttatcaagtga tact tec tct 
tatatgagatatacgagtatagaacgctatgcgaatacccaccaactagcagtggtgatg 
cctaatgctgatcatagtttctattcaaatatggcttatggacatagttattatgactat 

10 atactagaagtttatgattatgttcatcaaatatttccattgtctaaaaagagagaagat 
aattttatagcaggtcactctatgggaggttatggtgcaatcaaatttgcattaacgcaa 
agttatcgtttctcaaaagccgctatgctttcagcgccatatgatgtttctatgattggt 
caatatcaatggtatgattttactccagaagcgattgtaggtaatacgcaacatgtcgcg 
gggacatcttttgatccatactatttagttgaacaagcaatagacaatggacaaacgtta 

15 ccacaactatatattacttgtggaactgaagatgaattgtatcaaggtaatattgatttt 
gtgaactatttagatgaaaaaggtatttcatatcaatttaaaaaagcgccaggtcatcac 
gattatgcattttgggataaagcaatagaagatgtcattgaccgttttacatcatcacat 
atttaa 

20 Sequence 314 8 

MHQTINVILPEDKSYFDTNENAKPLKTMLLLHGLSSDTSSYMRYTSIERYANTHQLAWM 
PNADHSFYSNMAYGHSYYDYILEVYDYVHQIFPLSKKREDNFIAGHSMGGYGAIKFALTQ 
SYRFSKAAMLSAPYDVSMIGQYQWYDFTPEAIVGNTQHVAGTSFDPYYLVEQAIDNGQTL 
PQLYITCGTEDELYQGNIDFVNYLDEKGISYQFKKAPGHHDYAFWDKAIEDVIDRFTSSH 

25 I* 

Sequence 3149 
Cont:ig_0667_pos_3 956_3216 
is similar to (with p-value 2.0e-20) 
30 >gp:gp| Y12813 |BPA2INT_1 Bacteriophage A2 rep, xis and int g 
enes. NID: g3005824. 

atgcaaga 1 1 tacaagt at 1 1 aat 1 1 tgaaga 1 1 taccagtaagaaaaa tagaag tagat 
ggagaaccatattttttaggtaaagacgtggcagaaatattaggttacacaagatctgat 
aatgcaattagaaatcatgttgatgatgaagataagctgacgcaccaagttagtgcatca 

35 ggtcaaaaacgaaacatggtaatcatcaacgaatctggtttatacagcttaatctttgac 
gctgctaaacaaagtaaaaacgaaagtattagaaagaaagctaaacgttttaaacgttgg 
gtaaccgaagatgttttaccttccattcgtaaaacaggtacttatcaagttcctgataat 
ccaatggacgcattgcaacttatgttcgacgcacaaaaacaaaccaaagaagaaatagca 
actgttaaagcagatgttattgatatcaaagaaaatcaaaagctagatgcaggagaatac 

40 ggattgataacaaaaacagttcatcaacgcgttgcttatatcagacaaattcacggacta 
cctaataataaagaagttaacaaacctttatatagagatattaacagtaacgtaaatacg 
atggctggtattaaaacaagaacacaattaaaacaaaaacatttcgatgacgtaatgaat 
atgatcacaaattggtttccatctcaatcaacaatgtatgtcatcaaacaattagaaatg 
gactttgaaaacgaagtataa 

45 

Sequence 3150 

MQDLQVFNFEDLPVRKIEVDGEPYFLGKDVAEILGYTRSDNAIRNHVDDEDKLTHQVSAS 
GQKRNWIINESGLYSLIFDAAKQSKNESIRKKAKRFKRWVTEDVLPSIRKTGTYQVPDN 
PMDALQLMFDAQKQTKEEIATVKADVIDIKENQKLDAGEYGLITKTVHQRVAYIRQIHGL 
50 PNNKEV^KPLYRDINSNVNTMAGIKTRTQLKQKHFDDVMNMITNWFPSQSTMYVIKQLEM 
DFENEV* 

Sequence 3151 
Con t ig_0 6 7 0_pos_3 3 7 8_4 094 
55 is similar to (with p-value l.Oe-45) 

>gp:gp|AB014075 I AB014075_8 Clostridium histolyticum genes f 
or hypoxanthine-gucuiine phosphoribosyl-transf erase (HGPRTase 
), GTPase and 12 ORFs, complete and partial cds. NID: g38688 
63. 
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atgggacgtaaatggaacaacattaaagagaaaaaagcccaaaaagataaaaatactagt 
agaatatatgccaaatttggtaaagaaatatatgtagctgcaaagtctggtgagcctaat 
ccagagtcaaatcaaactttaagattagtattagaacgtgcaaaaacatattcagtacct 
aatcatattatagatagagctattgataaggctaaaggcgctggtgacgaaaactacgat 
5 cacttaagatatgaaggttttggtccgaatggttcaatgcttatagttgacgcattaaca 
aacaatgtaaatcgtacagcatcagatgtacgtgctgcgttcggtaagaatggaggaaat 
atgggagtatctggttcagtagcttatatgtttgaccatactgcaacctttggtgtagaa 
ggtaaatctgtagatgaagtcttagaaacactaatggagcaagatattgatgtaagagat 
gtaattgatgacaatggcttgactattgtttacgcagaaccagatcaatttgcacaagtt 
10 caggatgcattacgtgaagctggcgttgaggaatttaaagtagcagagtttgaaatgtta 
cctcaaactgatattgagttgtctgaagaggatcaagctatttttgaaaaattaatcgat 
gcacttgaagacttggaagatgttcaaaatgttttccataatgtagatttaaaataa 

Sequence 3152 

15 MGRKWNNIKEKKAQKDKNTSRIYAKFGKEIYVAAKSGEPNPESNQTLRLVLERAKTYSVP 
NHIIDRAIDKAKGAGDENYDHLRYEGFGPNGSMLIVDALTNNVNRTASDVRAAFGKNGGN 
MGVSGSVAYMFDHTATFGVEGKSVDEVLETLMEQDIDVRDVIDDNGLTIVYAEPDQFAQV 
QDALREAGVEEFKVAEFEMLPQTDIELSEEDQAIFEKLIDALEDLEDVQNVFHNVDLK* 

20 Sequence 3153 

Contig_0673_pos_2 813_0 

is similar to (with p-value 2.0e-34) 

>gp:gp|X81475 |MHLMP_1 M.hominis Impl and lrap2 genes. NID: g 
587470. 

25 atgactgaagcaacaattcaaaattataacgctaaacgtcaaaaagcagagcaagttata 
caaaatgcaaataaaattattgaaaacgctcaacctagtgtacaacaagtgtctgatgag 
aaatctaaggtagagcaagcactcagtgaattgaacaacgccaaatcagcgcttagagct 
gataaacaagaattacagcaagcatataatcagttgattcaaccaacggatttaaataat 
aagaaaccagcctctatcactgcgtacaatcaaagatatcaacaatttagtaacgaattg 

30 aacagcactaaaacaaatacagatcgcattttaaaagagcaaaatccaagtgtagctgat 
gtcaacaatgcactaaataaagtaagagaagtacaacaaaaattaaacgaagccagagca 
cttttacaaaataaagaagataatagtgcactagttcgagccaaagaacaacttcaacag 
gcagttgaccaagtcccttcaacagaaggtatgacgcaacaaactaaagatgattacaat 
tcaaaacaacaagctgctcaacaagaaatatcaaaagcacaacaagttatcgataatggc 

35 gatgcgactacacaacaaatttctaacgccaaaacaaatgttgaacgcgctttagaagca 
ttaaataatgcaaaaactggtttaagagcagataaagaggaacttcaaaatgcatataat 
caattaactcaaaatattgatacgagcggtaaaacgcctgcaagtatcaggaaatacaat 
gaagctaagtcacgtattcaaactcaaattgattcagctaaaaataaagcaaacagtatt 
ttaacaaatgacaatcctcaagtatcacaagtgactgctgcgttaaacaaaataaaagct 

40 gttcaacctgaattagataaagcgatagcaatgcttaaaaataaagagaataataatgca 
ttggttcaagcgaaacaacaacttcaacaaattgttaatgaagtagatccaacacaaggc 
atgacaacagatactgctaataactataaatcaaaaaaacgtgaagctgaagatgaaata 
caaaaagctcaacaaatcattaacaatggcgatgccactgagcaacaaattactaacgaa 
acaaatagagtaaatcaagcgattaatgcaataaacaaagccaaaaacgatttacgtgct 

45 gataagtctcaattggaaaatgcttataaccaattaatacaaaatgttgatacaaatggt 
aaaaaacctgctagtattcaacaataccaagctgctcgacaagctattgagacgcaatac 
aa taacgc taaa tcagaagcaca tcaaat tc t tgaaaa tag taaccc t tcag t taa tgaa 
gtagcacaagcattacaaaaagttgaagctgtacaacttaaagttaatgacgcgattcat 
atgcttcaaaataaagagaataatagtgcacttgtcacagctaaaaatcaacttcagcaa 

50 gcagttaatgatcaaccattaacaacaggtatgactcaagattctattaataactatgta 
gctaagagaaatgaggctcaaagtgctatcagaaatgcagaagctgtcatcaacaatggc 
gatgcaactgcaaaacaaatttcagacgagaaatctaaagttgaacaagcactagcacat 
ttgaatgatgctaaacagcaattaactgcagatactactgaattacaaacagcagttcaa 
caattaaacagaagaggcgatacaaataataaaaagccaagaagtatcaatgcatataat 

55 aaagcaattcaatcattagaaacacaaattacttctgctaaagataatgccaacgctgtg 
atacaaaaacctatacgtactgttcaagaggtaaa taa tgcattacaacaagtaaa tcag 
ttgaatcaacaattaactgaagcaattaatcaacttcaaccgctatcaaataatgatgca 
ttaaaagctgcaagattaaatttagaaaataaaattaatcaaactgtacaaactgatggt 
atgacacaacaatctatagaggcttatcaaaacgctaaacgcgtagcccaaaatgaatct 
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aacactgctttagcattaattaataacggcgatgccgatgaacaacaaattacaactgaa 
acagaccgagtcaatcagcaaactacaaacttaactcaagcaattaacgggttaacagtt 
aataaagaaccattagaaaccgctaaaacagcgttacaaaataacatcgaccaggtacct 
agtacagatggtatgactcagcaatctgttgcaaattataatcaaaaactacaaatagct 

5 aaaaacgaaattaacacaattaataacgttttagcgaacaatctagatgttaatgcaatc 
aaaacgaataaagcagaagcggaacgaatcagtaacgatttaacacaagctaagaataac 
ttacaagttgatactcaacctttagaaaaaataaaaagacaacttcaagatgaaattgat 
caaggtactaacacagatggaatgactcaagattcagtggataattacaatgatagctta 
agtgcagcaattatagaaaaaggcaaagtaaataaattacttaaacgtaatccgacagta 

10 gaacaagttaaagagagcgttgctaatgcacaacaagtcatacaagatttacaaaatgct 
cgaacttcacttgttccagacaaaactcaacttcaagaagctaaaaatagattagaaaac 
agtattaaccaacaaacagatactgacggcatgactcaagattcgcttaacaattataat 
gataaattagcaaaagctagacaaaaccttgaaaaaatatctaaagttttaggtggtcaa 
cctactgtagctgaaattagacaaaatacagatgaagcaaatgcacataaacaagcatta 

15 gacactgcacgttctcaacttacattaaatagagagccatatatcaatcatattaataat 
gaaagtcatttaaataacgcgcaaaaagataattttaaagctcaagttaactcagcacct 
aatcataatactttagaaacgattaaaaataaggctgatactttaaatcaatctatgaca 
gcattaagtggaagtattgcagattacgaaaatcaaaaacaacaagaaaattatttagat 
gcatctaacaataaacgtcaagactatgacaatgcagtcaatgcggctaaaggtatttta 

20 aaccaaactcaaagtccgacaatgagtgctgatgtgattgatcaa 

Sequence 3154 

MTEATIQNYNAKRQKAEQVIQNANKI I ENAQPSVQQVSDEKSKVEQALSELNNAKSALRA 
DKQELQQAYNQLIQPTDLNNKKPASITAYNQRYQQFSNELNSTKTNTDRILKEQNPSVAD 

25 VNNALNKVREVQQKLNEARALLQNKEDNSALVRAKEQLQQAVDQVPSTEGMTQQTKDDYN 
SKQQAAQQEI SKAQQVI DNGDATTQQ I SNAKTNVERALEALNNAKTGLRADKEELQNA YN 
QLTQNIDTSGKTPASIRKYNEAKSRI QTQI DSAKNKANS I LTNDNPQVSQVTAALNK I KA 
VQPELDKAIAMLKX^ENNNALVQAKQQLQQIVNEVDPTQGMTTDTANNYKSKKREAE^ 
QKAQQIINNGDATEQQITNETNRVNQAINAINKAKNDLRADKSQLENAYNQLIQNVDTNG 

30 KKPASIQQYQAARQAIETQYNNAKSEAHQILENSNPSVNEVAQALQKVEAVQLKVNDAIH 
MLQNKEl^SALWAKNQLQQAVNDQPLTTGMTQDSINNYVAKRNEAQSAIRNAEAVINNG 
DATAKQISDEKSKVEQALAHLNDAKQQLTADTTELQTAVQQLNRRGDTNNKKPRSINAYN 
KAIQSLETQITSAKDNANAVIQKPIRTVQEVNNALQQVNQLNQQLTEAINQLQPLSNNDA 
LKAARLNLENKINQTVQTDGMTQQSIEAYQNAKRVAQNESNTALALINNGDADEQQITTE 

35 TDRVNQQTTNLTQAINGLTVNKEPLETAKTALQNNIDQVPSTDGMTQQSVANYNQKLQIA 
KNEINTINWLANNLDVNAIKTNKAEAERISNDLTQAKNNLQVDTQPLEKIKRQLQDEID 
QGTNTDGMTQDSVDNYNDSLSAAIIEKGKVNKLLKRNPTVEQVKESVANAQQVIQDLQNA 
RTSLVPDKTQLQEAKNRLENS INQQTDTDGMTQDSLNNYNDKLAKARQNLEKI SKVLGGQ 
PTVAEIRQNTDEANAHKQALDTARSQLTLNREPYINHINNESHLNNAQKDNFKAQVNSAP 

40 NHNTLETIKNKADTLNQSMTALSGSIADYENQKQQENYLDASNNKRQDYDNAVNAAKGIL 
NQTQS PTMSADVIDQ 

Sequence 3155 

Con t i g_0 6 8 l_pos_7 3 8 3_6 148 

45 is similar to (with p-value 4.0e-56) 

>sp:sp|P71359|RECQ_HAEIN ATP-DEPENDENT DNA HELICASE RECQ (E 
C 3.6.1.-). >gp:gp|U32756|U32756_4 Haemophilus influenzae Rd 

section 71 of 163 of the complete genome. NID: gl573729. 
atgtctaagttatcaattggacaaaatgatgttgtcaaaacaagtactaaaagacgcaat 

50 ttaatcttcaaagtcaatccgacttatcagcgacaaaaatttgttgtggattatgttgca 
aatcatgaaggacaggcaggaatcatttattgttccactcgtaagcaggtagaagaatta 
cacgaagctctaaatagtgaaaaaattaagagcacaatttatcatgctggtttaacgaat 
aaagagagaattgaggcgcaaaatgatttcttgtatgatcgtgtagaggttgtcattgcg 
acaaatgcatttggtatgggtattgataaatcaaatgtacgttatgtcattcactataac 

55 atgcctggagatttggaatcttactatcaggaagctggacgcgcgggacgtgatggttta 
aaaagtgagtgtatccttttgtttagtgaacgagataagggattacatgagtattttatt 
accgtatcccaagctgatgatgactataaagataaaatgggcgaaaaattaacgaaaatg 
attcaatataccaaaacgaaaaagtgtttagaagcgacaattgt teat tat tttgaaccc 
aatgaaaatttagaggaatgcaatcaatgtagtaattgtatacaggaaaataaaacgtat 
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gatatgactcgtgaagcgaaaatgattattagctgtattgctcgaatgaagcaacaggaa 
aattatagtgttattatacaagttttacgtggagaagtgacagattatataaaacaccat 
cattataatgaattaacgacacatgggttgatgaaaaattatacaacatctgagttatca 
cacttaattgatgagctacgtttcaaaggatatttaaatgaaaatgatgaaattcttatg 
5 tgtgatacatcagtgaaacaattactaaataatcataccaaggtttataccactccgttc 
aaacaaaaaactaaagagaaggtatttatcaacactgttgaaggtgtggatagagcgtta 
tatcgtgagcttgttgatgtacgtaaacagctaagtgataaacttggaatagcacctgta 
agtatattttctgattacacgctcgaagaatttgctaagcgtaaacctgaatcgaaacaa 
gaaatgattgctattgatggtgtaggtagttataaattaaagcattattgtcctaagttt 
10 atcgaaaccatacaaagctataaaactagaatataa 

Sequence 3156 

MSKLSIGQNDWKTSTKRRNLI FKVNPTYQRQKFWDYVANHEGQAGI I YCSTRKQVEEL 
HEALNSEKIKSTIYHAGLTNKERIEAQNDFLYDRVEWIATNAFGMGIDKSNVRYVIHYN 
1 5 MPGDLESYYQEAGRAGRDGLKSEC I LLFSERDKGLHEYF I TVSQADDDYKDKMGEKLTKM 
IQYTKTKKCLEATIVHYFEPNENLEECNQCSNCIQENKTYDMTREAKMIISCIARMKQQE 
NYSVIIQVLRGEVTDYIKHHHYNELTTHGLMKNYTTSELSHLIDELRFKGYLNENDEILM 
CDTSVKQLLNNHTKVYTTPFKQKTKEKVFINTVEGVDRALYRELVDVRKQLSDKLGIAPV 
SIFSDYTLEEFAKRKPESKQEMIAIDGVGSYKLKHYCPKFIETIQSYKTRI* 

20 

Sequence 3157 
Con t ig_0 6 8 3_pos_4 0 4 3_4 831 
is similar to (with p-value l.Oe-70) 
>sp:sp|P23355|PTFB_XANCP PTS SYSTEM, FRUCTOSE-SPECIFIC IIBC 

25 COMPONENT (EIIBC-FRU) {FRUCTOSE- PERMEASE IIBC COMPONENT) { 
PHOSPHOTRANSFERASE ENZYME II, BC COMPONENT) (EC 2.7.1.69) (E 
II-FRU) . >pir :pir I B40944 I B40944 phosphotransferase system en 
zyme II (EC 2.7.1.69), fructose-specific - Xanthomonas campe 
stris pv. campestris >gp:gp|M69242 |XANFRUKAA_3 X . campestris 

30 1-phosphofructokinase (fruK) and PTS enzyme-II fructose (fru 
A) genes, complete cds . NID: gl55366. 

gtggttttgttgagtaactcaaaaaaagtagccgttgtcacaggtgcagcacaaggtatt 
ggcttgaaaattgctgagcgtctatttgaagacggatatagcatcgcgcttgtagacttt 
aatgaagaggtagctaaagagtcagctgaaaaattatcaaaagaagggcaagaggcagtt 

35 gcttttaaagcagacgtttcaaatcgcgatcaagtatttagtgtgttaaatcaagtcgtt 
gaacactttggcgatttaaatgtcctagttaataatgctggtcttggaccaatgacacca 
attgaatcagtaacacctgaacaatttaatcaagttgtaggtgttaacgtaggtggtgta 
ttctggggtatccaagctgcaattgaacaatttgataaattaggacatggcggtaaaatt 
atcaatgccacatctcaagcaggtgttgaaggtaatgctggcttatctctatatagcagt 

40 actaagtttgctgttagaggattgactcaagtagcagcacgtgatttagctgagaaaaat 
attacagtcaacgcattcgcacctggtattgttgaaacaccaatgatgaaaggtatcgct 
gaaaagcttgctgaggaaaataaccaaccaatggaatggggttggaaacaatttacagat 
caaattgccttaaaacgcttatctaaacctgaagatgtagctaatgtagtaagcttctta 
gcaggtagcga t tcaga t ta ta t tac tggccaaacaa tea teg t tgacggtggt atgaga 

45 ttccactaa 

Sequence 3158 

WLLSNSKKVAWTGAAQGIGLKIAERLFEDGYSIALVDFNEEVAKESAEKLSKEGQEAV 
AFKADVSNRDQVFSVLNQWEHFGDLNVLVNNAGLGPMTPIESVTPEQFNQWGVNVGGV 
50 FWGIQAAIEQFDKLGHGGKIINATSQAGVEGNAGLSLYSSTKFAVRGLTQVAARDLAEKN 
ITVNAFAPGIVETPMMKGIAEKLAEENNQPMEWGWKQFTDQIALKRLSKPEDVANWSFL 
AGSDSDYITGQTI I VDGGMRFH * 

Sequence 3159 
55 Con t ig_0 6 8 3_pos_0_l 2 6 8 

is similar to (with p-value 3.0e-73) 

>sp:Sp|Q48436 |BUDC_KLEPN ACETOIN(DIACETYL) REDUCTASE (EC 1. 
1.1.5) (ACETOIN DEHYDROGENASE) (AR) . >gp : gp | D86412 | D86412_l 
Klebsiella pneumoniae gene for meso-2 , 3-butanediol dehydroge 
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nase (D-acetoin forming), complete cds. NID: gl468938. 
gtgctcacttctaaagaaatcaaagaagctgatggaatcatcattgctgccgatagacag 
gtagatttatcaaggtttaatggtaaacctctcatcaatgaaagtgtacgtgaaggtatt 
catagacccaaggaattaatacaacgtgtgattgaccaagatgcacaaatttaccatgat 
5 caaaatatttcttcaaatatgtctagagaccaggaagaatctcataaaagtaatattcaa 
atggtatatcagcatttaatgaatggtgtttccttcatggttccatttatcgtcgttggc 
ggtttactcatagctattgccttaactcttggaggacacaccactccaaaaggattagtt 
atccccgaagattcattttggaaatctattgaaaatattggtagtttatcgtttaaattc 
atggttcccatccctgctggttatatcgcggtgagtattgctgataagcctggtcttgtt 

!0 ccaggtatgattggtggtgccattgctgctgatggtagtttatatggaagtgaagcagga 
gccggtttccttggtggtatcgtcgcaggtttcctagcgggctatattgcaaaatggatt 
aaacagattaaagttcctaaagctatggctcctattatgcctattattattatacctatt 
ctatcttctttaatagttggtctcatttttatatttgtaataggcgcaccaatttcaaat 
atatttggtgcattaacatcatggttaaaaggaatgcaaggtgctaacatcattattctt 

15 gctcttattattggcgcgatgattgcttttgatatgggaggtccagtaaacaaagtagca 
ttcttattcggttctgcattaattgctgaaggcaactacactgtgatgggaatggttgct 
gtagcagtatgtacaccaccgattggtttaggtttagctacatttgttcgtaaacaccaa 
ttcaataaagcagaacaagaaatgggtaaggcatcatttacgatgggattatttggtatt 
actgaaggggcaatcccttttgctgcacaagatcctctaagaatcattccagccaatatg 

20 attggcgcgatgattgcttcagtaatagcggcggttggaggtgtcggtgataaagttgct 
catggaggtcctattgtcgctgtactaggtggaataagtaatattttatggttctttata 
gctgttgtcgttggaagcttagtaaccatgttcacagtcttgttatttaagcgtcacacc 
cctgcTTT 

25 Seqiience 3160 

VLTSKEIKEADGIIIAADRQVDLSRFNGKPLINESVREGIHRPKELIQRVIDQDAQIYHD 
QNISSNMSRDQEESHKSNIQMVYQHLMNGVSFMVPFIWGGLLIAIALTLGGHTTPKGLV 

IPEDSFWKSIENIGSLSFKFMVPILAGYIAVSIADKPGLVPGMIGGAIAADGSLYGSEAG 
AGFLGGIVAGFLAGYIAKWIKQIKVPKAMAPIMPIIIIPILSSLIVGLIFIFVIGAPISN 
30 I FG ALTS WLKGMQG AN 1 1 1 LAL 1 1 G AMI AFDMGG PVNKVAFLFG S AL I A EGNYTVMGMVA 
VAVCTPPIGLGLATFVRKHQFNKAEQEMGKASFTMGLFGITEGAIPFAAQDPLRIIPANM 
IGAMIASVIAAVGGVGDKVAHGGPIVAVLGGISNILWFFIAVWGSLVTMFTVLLFKRHT 
PAX 

35 Sequence 3161 

Cont ig_0 6 8 7_pos„5 6 1 4_4 4 3 3 

is similar to (with p-value 4.0e-55) 
>sp:sp|Q4468llRISB_BACAM 6 , 7-DIMETHYL-8~RIBITYLLUMAZINE SYN 

THASE (EC 2.5.1.9) (DMRL SYNTHASE) (LUMAZINE SYNTHASE) (RIBO 
40 FLAVIN SYNTHASE BETA CHAIN). >gp :gp |X95955 | BARIBGENS_4 B.ainy 

loliguefaciens ribB, ribG, ribA, ribH & ribT genes. NID: gl5 

92687. 

atgcaattcgatacaattgagttggctatagaggctttaagaaatggagagagcattatt 
gtagttgacgatgaagatagagaaaatgaaggagatcttgtagctgttacggaatggatg 

45 gatgataataccattaattttatggctaaagagggtcgtggtctgatttgtgcaccaatt 
gataaatctatagctgaaagattaaaactacaatctatggagcaaaataacactgatatt 
tatggcacacattttactgtaagcattgatcattataaaactactacaggaatcagtgca 
catgaacgtacacaaacggctagagcactcatagatgaaaatactaatcctgaagatttt 
catcgtccggggcacttatttccacttatagcaaaagagaatggtgtgttaacacgtaat 

50 ggtcatactgaagctgccgtagatttggcacggttaacaggagcacaaccagctggagta 
atctgcgaaattatgaatgatgatgggacaatggctaagggtgaagatctccagtcattt 
aaagaacgccaccatttaaaaatgattactataaaaagtttggttgcttttcgtaaggct 
gttgaacttaatgttaatcttaaggcaaaggtcaagatgccaactgattttggtcatttt 
gatatgtatggatttacaacggattatagcgatgaagaaatcgtagctattgttaaagga 

55 gatttaaaaagcaatcctaatgtacgtatgcattctgcttgtctgactggggatattttt 
catagtcaaagatgtgattgcggggcacaacttgaagcgtcaatgaaatatattgacgaa 
catggtggaatgattatttatttacctcaagaaggtagaggaatagggttaattaataag 
ttgcgcgcatatgagttgatagaaaaaggttatgatacagttactgcaaatcttgctctt 
ggttttgatgaggatttgagagattatcatgttgcagctgaaatattaaagtattttgat 
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ataagtgaaattaacttgctcagcaataatcctaaaaaatttgaaggtttagaagattac 
ggcattgagatcgtagatagaattgaacttatcgttccagaaacacaatataaccatagt 
tatatggaaactaaaaaaaataaaatgggacatttaatatag 

5 Sequence 3162 

MQFDTIELAIEALRNGESIIWDDEDRENEGDLVAVTEWMDDNTINFMAKEGRGLICAPI 
DKSIAERLKLQSMEQNNTDIYGTHFTVSIDHYKTTTGISAHERTQTARALIDENTNPEDF 
HRPGHLFPLIAKENGVLTRNGHTEAAVDLARLTGAQPAGVICEIMNDDGTMAKGEDLQSF 
KERHHLKMITIKSLVAFRKAVELNVNLKAKVKMPTDFGHFDMYGFTTDYSDEEIVAIVKG 
10 DLKSNPNVRMHSACLTGDIFHSQRCDCGAQLEASMKYIDEHGGMIIYLPQEGRGIGLINK 
LRAYELIEKGYDTVTANLALGFDEDLRDYHVAAEILKYFDISEINLLSNNPKKFEGLEDY 
GIEIVDRIELIVPETQYNHSYMETKKNKMGHLI * 

Sequence 3163 

15 Concig_0687_pos_4420_3959 

>sp:sp|P51695|GCH2_BACAM GTP CYCLOHYDROLASE II (EC 3.5,4.25 
) / 3,4-DIHYDROXY-2-BUTANONE 4- PHOSPHATE SYNTHASE (DHBP SYN 
THASE) . >gp:gp|X95955|BARIBGENS_3 B . amyloliquef aciens ribB, 
ribG, ribA, ribH & ribT genes. NID: gl592687. 

20 atgaattttgaaggtaaattagttggtaaggatttaaaaattgcgattgttgttagtaga 
tttaatgattttattactacacgtctacttgaaggggctaaagatacacttattcgtcat 
gaagtagaagatacaaatattgatgtagcttatgtgcctggcgcattcgaaattccactc 
gttgcaaaaaaattagctcaaaaaggtgaatatgatgctgtgattacattaggatgtgtg 
attagaggcgcaacttcacattatgactatgtatgtaatgaagtagctaaaggtgtttct 

25 aaagcaaacgacatttcagatactccggtgatttttggagttctaacaactgaaagtatt 
gaacaagcagttgaaagagctggtactaaagctggaaataaaggttcagaagcagcagtt 
agtgcaatcgaaatggctaatctaattaagcaaatcaattaa 

Sequence 3164 

30 MNFEGKLVGKDLKIAIWSRFNDFITTRLLEGAKDTLIRHEVEDTNIDVAYVPGAFEIPL 
VAKKLAQKGEYDAVITLGCVIRGATSHYDYVCNEVAKGVSKANDISDTPVIFGVLTTESI 
EQAVERAGTKAGNKGSEAAVSAIEMANLIKQIN* 

Sequence 3165 
35 Contig_0691__pos_489_154 

is similar to (with p-value 2.0e-21) 

>pir :pir |A55345 I A55345 diamine N-acetyltransf erase (EC 2.3. 
1.57) - Escherichia coli >gp:gp | D25276 | ECOSNlA_l Escherichia 
coli gene for spermidine acetyl transf erase , complete cds . N 
40 ID: g517104. >gp : gp | AE000254 | AE000254_5 Escherichia coli K-1 
2 MG1655 section 144 of 400 of the complete genome. NID: gl7 
87862. 

gtggtagagttattagaaattaactttatacatagaacttgtgaagtgttaattattatc 
gatccgcagtatgcaaataatgggtacgcgaaaaaagcctttaaaatggctattgactat 
45 gcttttttagtattaaatatgaataaggtatacttatatgtggatattaagaatgagaaa 
gcagtacatatctatcaaagtaataatttcgaaatagaaggaacgttaaaggaacacttc 
tatacaaggggagaatatagagattgctatgtaatgggcttgttaaaaaggaattgggtt 
aataagaatgatgatgatttgtctcatataagatga 

50 Sequence 3166 

WELLEINFIHRTCEVLIIIDPQYANNGYAKKAFKMAIDYAFLVLNMNKVYLYVDIKNEK 
AVHIYQSNNFEIEGTLKEHFYTRGEYRDCYVMGLLKRNWVNKNDDDLSHIR* 

Sequence 3167 
55 Contig_0692_pos_844_1881 

is similar to (with p-value 9.0e-27) 

>sp:sp|P45578|YGAG_ECOLI 19.3 KD PROTEIN IN EMRB-GSHA INTER 
GENIC REGION. 

atgcataataaacaaaagatattagattttatagaaaataataaatatgattatgttgaa 
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ataagtcatcgtattcatgaacgccctgaattaggcaatgaagaaatttttgcatcgaga 
acattaattgaccaattaagagcaaatcgattcgaaatcgaaacggatattgcaggacat 

gcaacaggatttatagcaacgtatgattctgatatgactggaccggttataggatttcta 
gctgaatacgatgctttacctggtcttggtcacgcatgcgggcataatattattggtact 

5 gctagcgtacttgctgcagtagcactaaaagaagtcgtcgatgaaattggtggtaaagta 
gtcgttttgggatgtcctgctgaagaaggtggggaaaatggctccgcaaaagcttcttat 
gttaaagcaggtgtcattgatgaaattgatgtagcattgatgattcatcctggaaatgaa 
acttatcgtacaattaatactttagctgtggatgttctagatattaaattctatggacgt 
agtgcgcatgcatctgaaaatgcagatgaagcattaaacgctttagatgcaatgatttca 

10 tatattaatggtatagcacagttaaggcaacacattaaaaaaggacaacgagttcacggg 
gttattttagacggtggtaaagcggctaatattatacctgattttacacatgcgagattt 
tacactcgagctacttcgcggagagaacttgatgttttaactgaaaaagtaaaccaaatt 
gcaagaggtgctgctattcaaactgggtgtgattttgaatttggtcctatccagaatggt 
gtaaacgaatttatcaaagcacctaaacttgatgatttatttgaaaaatatgcaactgaa 

15 ttaggagaagaagtgattgatgatgattttggctatggatctacagatacaggtaatgta 
agtcatgttgtaccaacaatacatccacatattaaaattggttctcgaaatcttgttagg 
acatacccaccgctttag 

Sequence 3168 

20 MHNKQKILDFIENNKYDYVEISHRIHERPELGNEEIFASRTLIDQLRANRFEIETDIAGH 
ATGFIATYDSDMTGPVIGFLAEYDALPGLGHACGHNIIGTASVLAAVALKEWDEIGGKV 
WLGCPAEEGGENGSAKASYVKAGVIDEIDVALMIHPGNETYRTINTLAVDVLDIKFYGR 
SAHASENADEALNALDAMISYINGIAQLRQHIKKGQRVHGVILDGGKAANIIPDFTHARF 
YTRATSRRELDVLTEKVNQIARGAAIQTGCDFEFGPIQNGVNEFIKAPKLDDLFEKYATE 

25 LGEEVIDDDFGYGSTDTGNVSHWPTIHPHIKIGSRNLVRTYPPL* 

Sequence 3169 
Contig_0692_pos_609_148 
is similar to (with p-value 8.0e-17) 
30 >gp:gp|AF006687 |AF006687_1 Enterobacter agglomerans indole- 
3-acetyl-L-aspartic acid hydrolase gene, complete cds . NID: 
g2654566. 

atgaatgtagaaagctttaatttagaccatactaaggttgttgcaccttttattcgtcta 
gccgggactatggaaggtcttaatggtgatgtcatacacaaatatgacattcgtttcaaa 

35 cagcccaataaggaacatatggatatgcctggtctacattccttagagcatttaatggca 
gaaaacattagaaatcatactgataaagtagtagatttaagtcctatgggttgtcaaact 
ggattctatgtttcatttattaatcatgacgactacgatgacgtattaaatattatcgat 
caaacattgcatgatgtgttaaatgctagcgaagtcccagcttgtaatgaggttcaatgt 
ggttgggctgcaagtcattctttagaaggtgctaaaacaattgctcaagcatttttagat 

40 aaaagagagcaatggaatgacatctacggagaaggtaaataa 

Sequence 3170 

MNVESFNLDHTKWAPFIRLAGTMEGLNGDVIHKYDIRFKQPNKEHMDMPGLHSLEHLMA 
ENIRNHTDKWDLSPMGCQTGFYVSFINHDDYDDVLNIIDQTLHDVLNASEVPACNEVQC 
45 GWAASHSLEGAKTIAQAFLDKREQWNDIYGEGK* 

Sequence 3171 
Contig_0693_pos_5566_6633 
is similar to (with p-value 9.0e-84) 
50 >gp:gp|X94433 |BCASPAMIN_1 B.circulans aspartate aminotransf 
erase gene. NID: gll47556. 

atgctaatagataaagcaagatcatttattcagaccatgtatagcgaattaaaatataat 
actaatgaaattgaaaatagaatgaaagagattgagcaagaaattaacttgactggtagt 

tacacacatacttatgaagaattatcttacggtgcaaaaatggcatggagaaactcaaat 
55 cgttgtattggtagactgttttggaattctttaaatgttaaagatgcccgagatgtatgt 
gacgaaaaagaatttataaaatttatacatacaca tat taaagaagc tact aacggcgga 
aaaatcaaaccatatattacaatttttagtcctgaagatacacctaaaatttataataat 
cagttgattcgttatgctggttatgaaaatgttggcgatccatctgaaaaaaaggttact 
cgtttagctgaacatctaggttggaaaggtaaaggttcaaattttgatattttacctctg 
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atttatcaattgcctaacgacactataaaaatacacgaacttccaaatgatattgtaaaa 
gaagtttctatacatcatgaacactatcccaagctttcaaaattaggtttaaaatggtat 
gcggtacctattatttcaaatatggatttaaaaatcggtggtattacttaccctacagca 
ccttttaatggatggtatatggtaaccgaaattg'ctgtacgtaatttcacagacacctat 

5 cgttataatcttttagaaaaagttgcagaagcttttgaatttgatacacttaaaaataat 
tcatttaataaagatcgagcactcgtagagttaaatcatgctgtgtatcattcatttaaa 
gctgatggtgtttctattgttgaccacttaactgcagcgaagcaatttgaaatgtttgaa 
cgaaatgaacatcaacaaaacagaaatgttactggtaagtggtcttggctggcacctcca 
ctttcaccaactttaacttctaactatcatcatggatatgataatacaatgcatcatacg 

10 aatttcttctataaaaaagaagaacctatgaagtgccctttccattaa 

Sequence 3172 

MLIDKARSFIQTMYSELKYNTNEIENRMKEIEQEINLTGSYTHTYEELSYGAKMAWRNSN 
RCIGRLPWNSLNVKDARDVCDEKEFIKFIHTHIKEATNGGKIKPYITIFSPEDTPKIYNN 
15 QLIRYAGYENVGDPSEKKVTRLAEHLGWKGKGSNFDILPLIYQLPNDTIKIHELPNDIVK 
EVSIHHEHYPKLSKLGLKWYAVPIISNMDLKIGGITYPTAPFNGWYMVTEIAVRNFTDTY 

RYNLLEKVAEAFEFDTLKNNSFNKDRALVELNHAVYHSFKADGVSIVDHLTAAKQFEMFE 
RNEHQQNRNVTGKWSWIiAPPLSPTLTSNYHHGYDNTMHHTNFFYKKEEPMKCPFH* 

20 Sequence 317 3 

Con t ig_0 6 9 3_j>os_l 2 6 8 1_1 3 9 7 0 

>gp:gp|U13618|SEU13618_2 Staphylococcus epidermidis 9759 he 
at shock protein 10 (hsplO) and heat shock protein 60 (hsp60 
) genes, complete cds . NID: g535340. 

25 atgatgaatccattagcccaaaaattgaatgatgaaataaagcaatcaagtccagaagta 
ttagatatgatgtcacaattaggtaaagatatgttttatccaaaaggaattttatcgcaa 
tctgccgaagcaaaacgcacaacatataatgctactattggtatggcaaccaaaaaagaa 
ggtaaaatgtacgcaaattcacttaaccaaatgtttaatgaccttacaccggatgaaatt 
ttcccatatgcacctcctcaaggtgtagaggaattacgtgatttatggcagaaaaaaatg 

30 cttaaagaaaatcccgacttaaagtctaaatctatctctcgtcccatcgttacaaatgct 
ctcacgcacggtctttctctagtagctgatttatttgtagatacagatgataccgtctta 
ttaccgacacacaactggggtaattataaacttgtatttagcacacgtcacggtgctcat 
atcaatacgtattctatttttgatgactcaggtcacttcactacatctgaacttgtaaaa 
acattaaaagaatataaaaaagacaaagtgattattattttaaattatcctaataaccca 

35 actggttacacaccaaataaaaaagaagttaatactattgtaaatgcaattgaagaacta 
gctaataaaggtactaaagtagtaactgttgtcgatgatgcatactatgggttattttat 
gaagaagtttaccaacagtcaattttcacggctttaacacaggtgaaatcttctaacctt 
ttaccagtgcgtttggatggagctactaaagaattcttctcttgggggttccgagttggc 
tttatgacgtttggaattgatcatgaaacgttaaaaaatgcgctagaagctaaagtaaaa 

40 ggattaattcgtagcaatatttcaagttctccactaccttctcaaagtgcaatcaaacat 
gtacttaaatatcatgagcaatttgataaagaaatcgatcaaaatatcaatattttaaaa 
gaacgctacgaagtaactaaacaagtagtgtatgataataaatatgccaaatattggcaa 
gcctatgactttaattcaggatactttatgtcattgaaattaaatcaagtcgatccagaa 
gaattacgtaaacatttaattaataactattcaattggtattattgctttaaatagtaca 

45 gatattcgtattgcctttagttgtgtagaaaaagaagatattccttatgtctttgagtct 
attgctaatgcaattgatgatattaaataa 

Sequence 3174 

MMNPLAQKLNDEIKQSSPEVLDMMSQLGKDMFYPKGILSQSAEAKRTTYNATIGMATKKE 
50 GKMYANSLNQMFNDLTPDEI FPYAPPQGVEELRDLWQKKMLKENPDLKSKS I SRPI VTNA 
LTHGLSLVADLFViyrDDTVLLPTHNWGNYKLVFSTRHGAHINTYSIFDDSGHFTTSELVK 
TLKEYKKDKVIIILNYPNNPTGYTPNKKEV^P^IVNAIEEIANKGTKVVTVVDDAYYGLFY 
EEVYQQSIFTALTQVKSSNLLPVRLDGATKEFFSWGFRVGFMTFGIDHETLKNAIjEAKVK 
GLIRSNISSSPLPSQSAIKHVLKYHEQFDKEIDQNINILKERYEVTKQWYDNKYAKYWQ 
55 AYDFNSGYFMSLKLNQVDPEELRKHLINNYSIGIIALNSTDIRIAFSCVEKEDIPYVFES 
lANAIDDIK* 

Sequence 3175 

Contig_0693 J50S_16689_15784 
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unknown; although very good hit (p=0) to B. subtil is genome 

gtgcaggctagtcgaccaattttaattgttgcggatgaagtagaaggcgatgcacttact 
aatattgttttaaaccgtatgcgtggaacatttactgctgtagcagttaaagccccagga 
5 tttggtgatcgacgtaaagcaatgttagaagacctagcaatattaactggtgctcaagtc 
attactgatgatttaggtttagaacttaaagatgcatctcttgatatgctaggtactgct 
aataaagttgaagtgactaaagatcatacaacagtcgtagatggtaatggtgatgaaaat 
aatattgatgctcgtgtaggtcaaattaaagcacaaattgaagaaactgattcagagttt 
gataaagaaaaattacaggaacgtttggcaaaactagctggcggcgtagctgttatcaaa 

10 gtaggggctgcaagtgaaacagagcttaaagaacgtaaattaagaattgaagacgcatta 
aattcaacacgtgcggcggtggaagaaggtatcgttgctggtggtggtactgcgttagtc 
aatatatatcaaaaagtaagtgaaattaaagcagaaggtgatgttgaaacgggtgttaat 
atcgtattaaaagcattacaagcacctgttagacaaattgctgaaaatgcaggattagag 
ggttcaattattgttgaacgtttaaaacatgctgaagcgggcgttggtttcaatgcagca 

15 acaaatgaatgggttaatatgttagaagaaggtatagtagatccaactaaagtaactcgt 
tcagcgttacaacatgcagcaagtgtagctgctatgttcttaacaactgaagcagtcgtt 
gctagtattccagagccagaaaataatgaacaacctggaatgggtggcatgccaggtatg 
atgtaa 

20 Sequence 3176 

VQASRPILIVADEVEGDALTNIVLNRMRGTFTAVAVKAPGFGDRRKAMLEDLAILTGAQV 
ITDDLGLELKDASLDMLGTANKVEVTKDHTTWDGNGDENNIDARVGQI KAQI EETDSEF 
DKEKLQERLAKLAGGVAVIKVGAASETELKERKLRIEDALNSTRAAVEEGIVAGGGTALV 
NIYQKVSEIKAEGDVETGVNIVLKALQAPVRQIAENAGLEGSIIVERLKHAEAGVGFNAA 

25 TNEWVNMLEEGIVDPTKVTRSALQHAASVAAMFLTTEAWASIPEPENNEQPGMGGMPGM 
M* 

Sequence 317 7 
Cont ig_0 6 9 3_pos_5 3 7 4_3 905 
30 is similar to. (with p-value 3.0e-77) 

>gp:gp|U59924|SSU59924_l Sus scrofa nitric oxide synthase ( 
NOS) mRNA, complete cds , NID: gl762433. 

gtgtaccaatataacgacgatagcttaatgttacacaatgatttatatcaaattaatatg 
gctgaaagctactggaatgatggtatccatgaaagaatagcagtgtttgatttgtatttt 

35 cgaaaaatgccatttaatagtggatatgcggtattcaacggattgaaacgcgttgtgaat 
ttcatcgaaaactttgggtttacaaatgaagatatcacatatttaaaatcgataggttat 
gaagaagattttctaaattacctaaaagatttgaaatttacagggaatattaaatctatg 
caagaaggtgaaatttgttttggtaatgagccattattaagagttgaagcacctttaatc 
caagcgcaacttattgaaactattttgttaaatatcattaatttccaaacattaattgca 

40 actaaagccagccgaattcgtcaaatagcaactcatgacactttgatggaatttggtaca 
agaagagctcaagagatcgatgctgcactgtggggcgctagagcagcctttattggaggg 
tttgattctacaagtaatgttagagcaggaaaactttttaatatacctgtatctggcaca 
catgcacacgcactagtacaaacatatggtgatgagtatatagcattcaaaaagtatgct 
gagcgacataaaaattgtgtgttcttagttgatacttttcatactttaaaatcaggagta 

45 ccaaccgcaattaaggttgcaaaagagttaggagatactattaattttataggtatcaga 
ttagattctggtgatattgcgtacctatctaaagaagctcgtagaatgttagatgaggct 
ggttttacagaagctaaaattatcgcatcaaatgatttggatgagcagactattacaagt 
ttaaaagcacaaggcgctaaagttgacggatggggagtaggtacaaaactgattacagga 
tatgatcaaccagccttaggtgcagtttataaattggtttctattgaaacagatgatggc 

50 acaatgagtgatcgcattaaattatcaaataatgctgagaaagttactacaccaggcaaa 
aaaaatgtttatcgtattattaataataaaacaggcaaggctgagggcgactatattacg 
ctagaaggtgaaaatcctaatgacgaatctccattgaaaatgttccatcctgttcacact 
tacaaaatgaagtttattaaatcatttaaagcggttaatctacatcaatctatatttgaa 
aatggcaaacttgtataccatcttccagatgaatatgaagctcaggactatcttaaaaat 

55 aatttaagtattttatgggaagaaaataaacgatatcttaacccgcaagattatccagta 
gatttaagcactaaatgttgggaaaataagcataagcgtatttttgaagttgctgaacac 
gttaaagagatggaggatgaaaatgagtag 

Sequence 3178 
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VYQYNDDSLMLHNDLYQINMAESYWNDGIHERIAVFDLYFRKMPFNSGYAVFNGLKR\A/N 
FIENFGFTNEDITYLKSIGYEEDFLNYLKDLKFTGNIKSMQEGETCFGNEPLLRVEAPLI 
QAQLIETILLNIINFQTLIATKASRIRQIATHOTLMEFGTRRAQEIDAALWGARAAFIGG 
FDSTSNVRAGKLFNIPVSGTHAHALVQTYGDEYIAFKKYAERHKNCVFLVDTFHTLKSGV 
5 PTAIKVAKBLGDTINFIGIRLDSGDIAYLSKEARRMLDEAGFTEAKIIASNDLDEQTITS 
LWVQGAKVIXSWGVGTKLITGYDQPALGAVYKLVSIETDDGTMSDRIKLSNNAEKVTTPGK 
KNVYRI INNKTGKAEGDYITLEGENPNDES PLKMFHPVHTYKMKFI KSFKAVNLHQS I FE 
NGKLVYHLPDEYEAQDYLKNNLSILWEENKRYLNPQDYPVDLSTKCWENKHKRIFEVAEH 
VKEMEDENE* 

10 

Sequence 3179 
Con t ig_ 0 6 9 4_pos_3 4 8 5_4 648 
is similar to (with p-value 2.0e--36) 
>sp:sp|P31448|YIDK_ECOLI HYPOTHETICAL 62 . 1 KD PROTEIN IN EM 

15 RD-GLVG INTERGENIC REGION. >gp : gp| L10328 | ECOUW82_44 E. coli; 
the region from 81.5 to 84.5 minutes. NID: g290484. >gp:gp| 
AE000445 |AE000445_8 Escherichia coli K-12 MG1655 section 335 

of 400 of the complete genome. NID; gl790105. 
gtgttcaataaaatgtttaaagttgatgaatacttaggtgtaagtagttcaactgctgtc 

20 attattatttcatctattattggtataattggcattatttacttatttataggtggttta 
tcgttaagtgcttttagtgattcgatttatggcatggctttaattatagctggacttgcg 
attacaatattaggtctaggtcaattaggagatggcaacttcctacatggtttcgacaaa 
atcgtgcaagacacgcctgagaaattgaatggttttggtaaggtggactcggatgttgta 
ccttggccaaccctattcttcggtatgttctttaacaatttattcttctggtgcgcaaac 

25 cagatgatagttcaaaaagcactcgcagctaaaaatttaaaagaatctcaaaaaggtgca 
atatatttaagcttatttaaagtgttcggacctttatttacagtcttaccaggcgtagta 
gcatttaactattttaatggtagtcttgataaatcagataacgcttaccctgcacttgta 
acttcagtattaccagaatgggcattcggcttatttggtgcggttatttttggtgcaata 
ttgagc teat ttgttggc tea ttgaatagtacaactacac tat taacac toga tttc tat 

30 aaacctatttttggaaaaaataaatcagataaacatattgctcgagtaggccatattgct 
actgtag teat tggagt tat tgttgtagcacttgcaccagtcatc teat tat tccctagt 
ggtctttatgcagtagttcaacagtttaatggtgtgtatagtatgccagtgctagcttta 
attttaatcgctttcttttctaaacgcacatctaaattgggcgctaaagtgacactcttc 
acacatataattttatacgctataatcagctttgtatttacagaaattaattatctatac 

35 acatttagtgtattattctttgtagatttaattattattttgatttttaacaaagttaaa 
ctatctagtgagtttgatttaagcacgcaccaaccgaaagtagatatgacgccatggaaa 
tatcgttacgttgcaggtattattgttcttgctttagtagtagtaagttatattatcttc 
tcaccactcgtgttggcaaaataa 

40 Sequence 3180 

VFNKMFKVDEYLGVSSSTAVIIISSIIGIIGIIYLFIGGLSLSAFSDSIYGMALIIAGLA 
ITILGLGQLGDGNFLHGFDKIVQDTPEKLNGFGKVDSDWPWPTLFFGMFFNNLFFWCAN 
QMIVQKALAAKNLKESQKGAIYLSLFKVFGPLFTVLPGWAFNYFNGSLDKSDNAYPALV 
TSVLPEWAFGLFGAVIFGAILSSFVGSLNSTTTLLTLDFYKPIFGKNKSDKHIARVGHIA 

45 TWIGVIWALAPVISLFPSGLYAWQQFNGVYSMPVLALILIAFFSKRTSKLGAKVTLF 
THIILYAIISFVFTEINYLYTFSVLFFVDLIIILIFNKVKLSSEFDLSTHQPKVDMTPWK 
YRYVAGI I VLALVWSYI IFS PLVLAK* 

Sequence 3181 
50 Cont ig_0 6 9 9_pos„l 7 8 8_1 444 

is similar to (with p-value 4.0e-59) 
>pir:pir 1 167760 1 167760 transposase (insertion sequence ISIO 

) - Escherichia coli >gp:gp| S67119 | S67119_2 BST= somatotropin 

. . .BST/beta-Gal fusion protein [Escherichia coli, LBB84, pla 
55 smid pXT107, ISlOL/R-1, PlasmidTransposonlnsertionMutant , 3 

genes, 1679 nt] . NID: g45S674. 

atgcagattgaagaaaccttccgagacttgaaaagtcctgcctacggactaggcctacgc 
catagccgaacgagcagctcagagcgttttgatatcatgctgctaatcgccctgatgctt 
caac taaca tg t tggc t tgcgggcgt tea tgc t cagaaacaagg t tgggacaagcac t tc 
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caggctaacacagtcagaaatcgaaacgtactctcaacagttcgcttaggcatggaagtt 
ttgcggcattctggctacacaataacaagggaagactcactcgtggctgcaaccctgctt 
actcaaaatctattcacacatggttacgttttggggaaattatga 

5 Sequence 3182 

MQIEETFRDLKSPAYGLGLRHSRTSSSERFDIMLLIALMLQLTCWLAGVHAQKQGWDKHF 
QANTVRNRNVLSTVRLGMEVLRHSGYTITREDSLVAATLLTQNLFTHGYVLGKL* 

Sequence 3183 
10 Contig_0702_jpos_12730_11027 

>sp:sp|P16659|SYP_ECOLI PROLYL-TRNA SYNTHETASE (EC 6.1.1.15 
) ( PROLINE- -TRNA LIGASE) (PRORS) (GLOBAL RNA SYNTHESIS FACTO 
R) . 

atgaaacaatctaaagtatttataccaacgatgagagaagtccctgcagaggcggaagca 

15 ttaagccatcgtttattattaaaagcagggttaattaaacagagtacaagtggtatatat 
agtcacttaccacttgctacacgtgtactaaataatatatctaaaatcatacgtgaagaa 
atggaaagtatagatgctgtagaaattcttatgccagctttacaacaagcagaattatgg 
gaagagtcaggacgttggagtgcatatggtccagaactaatgcgtttaaaagacagaaac 
ggacgtgaatttgcattaggacctactcatgaggaagtagtcacttctttagtaagagat 

20 gaattaaaatcatataaacagttaccccttactttatttcaaattcaatccaaatataga 
gatgagaaaagaccacgctttggattattaagaggacgcgaatttcttatgaaagatgca 
tattccttccattcagatgaagcttcattagatgcaacttatcaagatatgtatcaagca 
tatagtcgcatattcaaacgtgtaggcatcaatgcaagaccggttgtggcagattcaggt 
gcaatagggggaagtcatacacacgagtttatggcattgagtgaaattggggaagataca 

25 atagtttatagtaatgagagtgactatgcagcgaatattgaaaaggctgaagttgtttat 
catccttctcataagcattctgcacttgcggaattgactaaagttgagacgcccaatgtt 
aaaacagctcaagaagttgcagaatatttaaagagaccattagatgaaattgtaaaaact 
atgatctttaaaatagatggcgaatttattatgtttctagttcgtggacatcatgaatta 
aatgaagtgaaattaaaatcatatttcggcacggaacatgttgaaatggctactccagat 

30 gaaattgttaatcttgtagatgccaatccggggtctcttggtcctatttttgataaagat 
attaaaatttatgccgataattacttacaagatttaaataactttgttgtaggagctaat 
gaagatcattatcactatataaatgtcaatattggtagagactttgatgtaacagaatac 
ggtgactttagattcattacacaaggtgagatgttaagtgatggctcgggagtagcacaa 
tttgctgaaggcattgaagtaggacaagttttcaaattagggacaaaatattctgaatca 

35 atgaatgcaactt ttctagataatcagggaaaagctcaaccactcattatgggctgttat 
ggtattggagtatcaagaacattaagtgcaattgttgaacaaaacaatgacgagaatgga 
attatttggccaaaatcagtaacgcctttcgatatccatctaattactatcaatcctaaa 
aaggatgatcaacgtactttaggtgatcaactttatcaaaaattaatggattcatacgat 
gttttatatgatgaccgaaaagaacgtgctggtgttaaatttaatgattcagacctaatt 

40 gggttaccggtacgagttgttgttggtaaaagagctgaagaaggtattgttgaggtaaaa 
caacgcattaacggtttaagtgaagaagtgcaaattgatgaattagagtattacttacaa 
gaattatttaagaatattaagtaa 

Sequence 3184 

45 MKQSKVFIPTMREVPAEAEALSHRLLLKAGLIKQSTSGIYSYLPLATRVLNNISKIIREE 
MESIDAVEILMPALQQAELWEESGRWSAYGPELMRLKDRNGREFALGPTHEEWTSLVRD 
ELKSYKQLPLTLFQIQSKYRDEKRPRFGLLRGREFLMKDAYSFHSDEASLDATYQDltfYQA 
YSRIFKRVGINARPWADSGAIGGSHTHEFMALSEIGEDTIVYSNESDYAANIEKAEWY 
HPSHKHSALAELTKVETPNVKTAQEVAEYLKRPLDEIVKTMIFKIDGEFIMFLVRGHHEL 

50 NEVKLKSYFGTEHVEMATPDEIVNLVDANPGSLGPIFDKDIKIYADNYLQDLNNFWGAN 
EDHYHYINVNIGRDFDVTEYGDFRFITQGEMLSDGSGVAQFAEGIEVGQVFKLGTKYSES 
MNATFLDNQGKAQPLIMGCYGIGVSRTLSAIVEQNNDENGIIWPKSVTPFDIHLITINPK 
KDDQRTLGDQLYQKLMDSYDVLYDDRKERAGVKFNDSDLIGLPVRWVGKRAEEGIVEVK 
QRINGLSEEVQIDELEYYLQELFKNIK* 

55 

Sequence 3185 
Contig_0703_pos_2091_3398 
is similar to (with p-value 3.0e-36) 
>sp:sp|Q57991 |AK_METJA PROBABLE ASP ARTOKINASE (EC 2.7.2.4) 
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(ASPARTATE KINASE). >gp : gp | U67506 | U67506_4 Methanococcus jan 
naschii section 48 of 150 of the complete genome. NID: gl591 
274. 

gtgaatgaagaccctgaacgtaaaatcattatcgtttcagctccaggcaaaaggcataat 
5 gacgacattaaaactactgatttattaattcgtctctatgaaaaagtacttaataaatta 
aattatgaaagtaaaaaacaagaaattatccaaagatatgctgatatagtagaagaatta 
ggtataggaaatgacattttaataacaattaatgacactttagaggaatacattaaacat 
ctttctgacaaacctaaccgtttatatgatgctttattatcttgtggcgaaaattttaat 
gctcaattaatagcccagtataataatagtcaaggtattcctactcgttatatttctcct 

10 aaagaagctggattaactgtaactgatttaccacagcaagctcaaattttagattccgca 
tataatgaaatatacaaattgcgtgattatgatgaaaagctaattattcctggttttttc 
ggagtttcaaagcaaaattatatcgttacgtttccacgcggtggttctgacataactggt 
gctatcatagcacgtggcgtccgagcctcactttatgagaacttcactgatgtatcagga 
atatataaagctaatccgaatatcataaataatcctgaactcatagaggaaataacttat 

15 agagaaatgcgagagctatcttatgcaggatttggagtttttcacgatgaagctctacaa 
cctttatacaaagatcgaattcccgtagttatcaaaaatactaatcgtccaaatgataaa 
gggacctacattttacatgaccgtgaaatcgattctaaaaatgtcattagtggaattagt 
tgtgataaaggctttactgtgattaatattaaaaaatatttaatgaatagattagttgga 
tttacacgaaagattcttggcgttttagaagaatttaatatatcatttgaccacatgcct 

20 tctggtattgataacataagtattatcatgcgtacaaatcaaattcaaggtaaagaaagt 
caagttcttaatgccatacgcaaacgttgtgaagttgatgaattaagtatcgaccatgat 
ttagcagtactaatgattgttggtgaaggtatgaatcaagttgttggtacagctagtaaa 
attactcacgccctttcagaatcaaacattaatttaataatgattaaccaaggtgcttct 
gaaatttcaatgatgtttggaattcatgaagcagatgctgaaaaagcagtattatctacg 

25 tacgaattttgttacaacggtgtttgtctaaaaaatttgtgtaaataa 

Sequence 3186 

VNEDPERKIIIVSAPGKREiNDDIKTTDLLIRLyEKVLNKLNYESKKQEXIQRYADIVEEL 
GIGNDILITINDTLEEYIFCHLSDKPNRLYDALLSCGENFNAQLI AQYNNSQGI PTRYI SP 

30 KEAGLTVTDLPQQAQILDSAYNEIYKLRDYDEKLI I PGFFGVSKQNYIVTFPRGGSDITG 
AIIARGVRASLYENFTDVSGIYKANPNIINNPELIEEITYREMRELSYAGFGVFHDEALQ 
PLYKDRIPWIKNTNRPNDKGTYILHDREIDSKNVISGISCDKGFTVINIKKYLMNRLVG 
FTRKILGVLEEFNISFDHMPSGIDNISIIMRTNQIQGKESQVLNAIRKRCEVDELSIDHD 
LAVLMIVGEGMNQWGTASKITHALSESNINLIMINQGASEISMMFGIHEADAEKAVLST 

35 YEFCYNGVCLKNLCK* 

Sequence 3187 
Contig_0707_^os_8234„4395 
>gp:gp|AF067776|AF067776_l Abiotrophia defective extracellu 
40 lar matrix binding protein (emb) gene, partial cds. NID: g32 
49002. 

gtgaaatctgaagctagacaagcagtacagaataaagcaaatgaacagattaatcatatt 
caaaacacgcctgatgcaactaatgaagaaaaacaagaggcaataaatagagtaagtgct 
gaattagcaagagttcaagcacaaataaatgcagaacatacaacccaaggtgtcaaaact 

45 atcaaagacgacgcgataacttctttatctcgaattaatgcacaagttgttgagaaagag 
tctgcaagaaatgcaatcgaacaaaaggcaacacaacaaacgcaatttattaataataat 
gataatgctacagatgaagaaaaagaggtcgccaacaatttagttatcgctacaaaacaa 
aaatcattagataatattaactccttatcttcaaataatgatgttgaaaatgctaaagta 
gcaggaataaatgaaatagctaacgttttaccagcaaccgctgttaagtcaaaagcaaaa 

50 aaagatattgatcaaaaactcgcgcaacagattaatcaaattcaaacgcatcaaactgct 
acaactgaggaaaaagaagcggctattcaattggcaaatcaaaaatcaaatgaagcaaga 
acagcaattcaaaatgaacatagtaacaatggtgtcgcacaagctaaatctaacggcatt 
catgaaattgaattagttatgccagatgcgcacaaaaaatctgatgctaaacaaagtatc 
gataataaatataatgagcaaagtaatactatcaacactacaccagatgcaacagatgaa 

55 gaaaagcaaaaagcattagataaattaaaaatagctaaagatgcaggatacaacaaagtt 
gatcaagcgcaaacaaaccaacaagtatctgatgcaaaaactgaggctatagatacgata 
actaatattcaagcaaatgttgcaaaaaaaccatccgctcgagtggaattagattcaaag 
tttgaggatttaaagcgtcaaatcaatgcaacgcccaatgctacagaagaagaaaaacaa 
gatgcaattcaaagattgaatggtaaaagagatgaagttaagaatctaataaatcaagat 
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agacgtgacaatgaagttgaacagcacaaaaatattggacttcaagaattagaaacgatt 
catgctaatccaactagaaaatctgatgcgctccaagagtcacaaactaaatttatttca 
caaacagagttaattaataataacaaagatgcaactaatgaagaaaaagatgaagccaaa 
cgacttcttgagattagtaaaaataaaactataacaaatatcaatcaagcgcaaactaat 
5 aatcaagttgataatgctaaagataacggcatgaatgagattgctaccataataccagca 
acaacaattaaaacagatgcaaaaacggctattgataaaaaagctgagcaacaagttaca 
atcatcaatggtaacaacgatgcaacagatgaagaaaaagcagaggctagaaagctggtt 
gaaaaagcgaaaattgaagccaaatctaatattacaaatagtgatactgaaagggaagtc 
aatggtgctaaaaccaatgggttagaaaaaataaacaatattcaaccatcaactcaaact 

10 aaaacaaatgctaagcaagaaataaatgacaaagctcaagaacaattaatccaaattaat 
aacacgcctgatgcaaccgaagaagaaaagcaagaggcaacaaatagagtcaatgctgga 
ttagcacaagcaatacaaaatattaataatgcacatagtactcaagaagtaaatgaatct 
aaaacaaatagtattgctacaatcaagagtgtacaacccaatgtgatcaaaaaaccgact 
gctataaatagtttgactcaagaagctaataatcaaaagacgttaataggtaatgatggt 

15 aatgctactgatgatgaaaaagaggctgcaaagcaattagtgacccaaaaattaaatgaa 
caaattcaaaaaattcatgaaagtacacaagataatcaagttgataacgtaaaagcacaa 
gctatcactgcaattaaattgattaatgcaaatgcacataaaagacaagatgccattaat 
attttgactaatctagctgaaagtaaaaaatcagatataagagccaatcaagatgcaacc 
actgaagagaaaaatacggcaatacaatctatagatgatacgttagcacaagcacgtaac 

20 aatattaatggtgcaaatacaaatgcgttagtggatgagaatttagaagacggtaagcaa 
aagttacaacgtattgtgttgtcaactcaaactaaaacacaagctaaagcagacattgct 
caagcaataggtcaacaaaggtcgacaatagaccagaatcaaaatgctacaacagaagaa 
aaacaagaagcccttgagagacttaatcaagaaacaaatggagbcaatgatagaatacaa 
gcagctttagcaaatcaaaatgttacagacgaaaaaaataatatattagaaacaataaga 

25 aatgttgaacctattgtaattgtaaaaccaaaggctaatgaaataattagaaaaaaagct 
gcggaacaaacgactttaataaatcaaaatcaagatgcgacactagaagaaaaacaaata 
gcacttggcaaattagaagaagtaaagaatgaagcgttaaatcaagtatcacaggcacac 
tcaaataatgatgtgaaaattgtggaaaataatggaattgctaaaatttctgaggtccat 
cctgagactataattaaacgtaatgctaaacaagaaattgaacaagatgcgcaaagtcaa 

30 attgatactatcaatgcaaataataaatcaactaatgaagaaaaatcagccgctatagat 
agagttaatgtagctaaaattgatgctattaacaatattactaatgctacaactacacaa 
ttagttaatgatgctaaaaatagtggtaacacgagtattagccaaatattaccaagtaca 
gcagtcaaaactaatgcattagcagctctagctagcgaagctaaaaataaaaacgctata 
atagatcaaacaccaaatgcgacagcagaagaaaaagaagaagcaaataataaagttgat 

35 cgtcttcaagaagaagcagatgctaatatcctaaaagcacacactactgatgaagttaat 
aatattaaaaatcaagctgttcaaaatattaacgctgttcaagttgaagttatcaagaaa 
caaaacgctaaaaaccaattaaatcaattcattgataatcaaaagaaaattattgaaaat 
acgcctgatgcaacactagaagaaaaagctgaagctaatagattgcttcaaaatgtacta 
acttccacatcagatgaaattgctaatgtagatcataacaacgaggttgatcaagcttta 

40 gataaagctagaccaaaaatcgaggcaattgtaccacaagttagtaagaaacgagatgct 
ttaaatgcaatccaagaagcatttaattcacaaactcaagaaatacaagagaagcaagaa 
gctacgaatgaagaaaaaaccgaagcattaaataaaataaaccaattacttaatcaggct 
aaagtaaatattgatcaagcacagtcaaataaagatgtagatagtgcgaaaacacgtagt 
attcaagatatagagcaaattcaaccacatccacaaacaaaagcaaccgggcgtcacaga 

45 ttaaatgaaaaagctaaccaacaacaaagtactattgcaactcatcctaattcaacaatt 
gaagaaagacaggaagcaagtgcaaaactacaagaagttcttaaaaaaagccatagctaa 

Sequence 3188 

VKSEARQAVQNKANEQINHIQNTPDATNEEKQEAINRVSAELARVQAQINAEHTTQGVKT 
50 IKDDAITSLSRINAQWEKESARNAIEQKATQQTQFINNNDNATDEEKEVANNLVIATKQ 
KSLDNINSLSSNNDVENAKVAGINEIANVLPATAVKSKAKKDIDQKLAQQINQIQTHQTA 
TTEEKEAAIQLANQKSNEARTAIQNEHSNNGVAQAKSNGIHEIELVMPDAHKKSDAKQSI 
DNKYNEQSNTINTTPDATDEEKQKALDKLKIAKDAGYNKVDQAQTNQQVSDAKTEAIDTI 
TNIQANVAKKPSARVELDSKFEDLKRQINATPNATEEEKQDAIQRLNGKRDEVKNLINQD 
55 RRDNEVEQHKNIGLQELETIHANPTRKSDALQELQTKFISQTELINNNKDATNEEKDEAK 
RLLE I S KNKT ITNI NQAQTNNQ VDNAKDNGMN E I ATI I PATTI KTDAKTA I DKKAEQQ VT 
1 1 NGNNDATDEEKAEARKL VEKA KI EAKSNITNSDTEREVNG AKTNGLEK I NN I Q PSTQT 
KTNAKQEINDKAQEQLIQINNTPDATEEEKQEATNRVNAGLAQAIQNINNAHSTQEVNES 
KTNSIATIKSVQPNVIKKPTAINSLTQEAJNNQKTLIGNDGNATDDEKEAAKQLVTQKIiNE 
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Q I QKIHESTQDNQVDNVKAQAITAI KLINANAHKRQDAINILTNLAESKKSDIRANQDAT 
TEEKOTAIQSIDDTLAQARNNINGANTNALVDENLEaXSKQKLQRIXnjSTQTK^ 
QAIGQQRSTIDQNQNATTEEKQEALERLNQETNGVNDRIQAALANQNVTDEKNNILETIR 
^A/EPIVIVKPKANEIIRKKAAEQTTLINQNQDATLEEKQIALGKLEEVKNEALNQVSQAH 
5 SNNDVK I VENNG I AKI S EVH PETI I KRNAKQEI EQDAQSQIDTINANNKSTNEEKS AAID 
RVNVAKIDAINNITINIATTTQLVNDAKNSGNTSISQILPSTAVKTNAIAALASEAKNKNA 
IDQTPNATAEEKEEANNKVDRLQEEADANILKAHTTDEVNNIKNQAVQNINAVQVEVIKK 
QNAKNQLNQFIDNQKKIIENTPDATLEEKAEANRLLQNVLTSTSDEIANVDHNNEVDQAL 
DKARPKIEAIVPQVSKKRDALNAIQEAFNSQTQEIQEKQEATNEEKTEALNKINQLLNQA 
10 KVNIDQAQSNKDVDSAKTRSIQDIEQIQPHPQTKATGRHRLNEKANQQQSTIATHPNSTI 
EERQEASAKLQEVLKKSHS* 

Sequence 3189 
Contig_0711_pos_3905_0 

15 is similar to (with p-value 2.0e-53) 

>sp:sp|069282 |MQO_CORGL MALATE:QUINONE OXIDOREDUCTASE (EC 1 
.1.99.16) (MALATE DEHYDROGENASE (ACCEPTOR) ) (MQO) . 
gtgagcaaaaaaatggctaataaagagtcaaaaaatgttgttattattggcgctggtgtc 
ttaagtacgacatttggttctatgattaaagaattagaacctgattggaacatcaaactc 

20 tatgaacgcttagatcgtccaggtattgaaagttctaacgaaagaaacaatgccggtaca 
ggacatgcggcgttatgtgaattgaactatacagtacaacaacctgatggttcaattgat 
atagaaaaagccaaagaaatcaacgaacaattcgagatttcaaaacaattctggggtcac 
ttagtaaaaagtggtaacatcagtaaccctagagatttcattaatccacttcctcacatt 
agtttcgtaagaggtaaaaataacgttaaattcttaaaaaaccgttacgaagcaatgcgt 

25 aacttccctatgttcgataacatcgaatatacagaagatatcgaagaaatgagaaaatgg 
atgccattaatgatgacaggtcgtactggtaacgaaatcatggcggctagtaaaatcgac 
gaaggtacagatgttaactacggtgaattaactcgtaaaatggcaaaaagtattgaaaaa 
catccaaatgctgatgttcaatacaaccacgaagtaattaatttcaatcgtcgtaaagac 
ggtatttgggaagttaaagtta 

30 

Sequence 3190 

VSKKMANKESKNWIIGAGVLSTTFGSMIKELEPDWNIKLYERLDRPGIESSNERNNAGT 
GHAALCELNYTVQQPDGSIDIEKAKEINEQFEISKQFWGHLVKSGNISNPRDFINPLPHI 
SFVRGK2^KFLKNRYEAMRNFPMFDNIEYTEDIEEMRKWMPLMMTGRTGNEIMAASKID 
35 EGTDVNYGELTRKMAKSIEKHPNADVQYNHEVINFNRRKDGIWEVKVX 

Sequence 3191 
Contig_0712_pos_4917_4060 
is similar to (with p-value 3.0e-'91) 
40 >gp:gp|D13095 |BACPK_1 B. stearothermophilus phosphof ructoki 
nase and pyruvate kinase genes. NID: g285620. 

atgtttaaagatttttttaatcgaagcaagaaaaagaaatatttaacagttcaagattct 
aaacaaaatgatgtacctgctggtataatgacaaaatgtcctaattgcaaaaaaataatg 
tatacaaaagaattgaatgaaaatttaaatgtatgctttaattgtgatcatcatatagct 

45 t taaccgca ta taaaagaatagaagcaa 1 1 tcagacgatggatca 1 1 ta tagaa 1 1 tgat 
agaggtatgacatctgctaacccattagactttcctgggtatgaagaaaaaattgaaaaa 
gatcagcaaaagactggacttaatgaagcgttagtgtctggtactgcgaaattagatgga 
atacaatatggtgttgcagttatggatgctcgttttagaatgggaagcatgggctctgta 
gttggtgaaaaaatatgcagaat tat tgat tat tgtacagaacatcgtttgccatttatt 

50 ctgttttctgcgagtggtggagctagaatgcaagagggaattatttctttaatgcaaatg 
gggaaaacaagtgtttctttaaaaagacattctgatgcaggactattatatatttcttac 
ataactaatcccactactggaggggtttctgcaagttttgcttcggttggagatattaat 
ttaagtgaacctaaagcactaatcggatttgctggtagacgtgttatagaacaaacaatt 
aatgaaaagttgcctgatgatttccaaactgctgagtttttattagagcatggtcaactt 

55 gataaagtcattcatcgaaaagatatgcgtgagactttatcaaatattttaaaaatccat 
caagaggtgagtaactaa 

Sequence 3192 

MFKDFFNRSKKKKYLTVQDSKQNDVPAGIMTKCPNCKKIMYTKELNENLNVCFNCDHHIA 
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LTAYKRIEAISDDGSFIEFDRGMTSANPLDFPGYEEKIEKDQQKTGLNEALVSGTAKLDG 
IQYGVAVMDARFRMGSMGSWGEKICRIIDYCTEHRLPFILFSASGGARMQEGIISliMQM 
GKTSVSLKRHSDAGLLYISYITNPTTGGVSASFASVGDINLSEPKALIGFAGRRVIEQTI 
NEKLPDDPQTAEFLLEHGQLDKVIHRKDMRETLSNILKIHQEVSN* 

5 

Sequence 3193 
Con t ig_0 7 1 2_pos_3 9 5 8_3 1 1 6 
is similar to (with p-value 2.0e-74) 
>sp:sp|Q54776|ACCD„SYNP7 ACETYL -CO ENZYME A CARBOXYLASE CARB 
10 OXYL TRANSFERASE SUBUNIT BETA (EC 6.4.1.2). >gp:gp|U59237 |SP 
U59237_6 Synechococcus PCC7942 ORF102, ORF120, 0RF113, ORF12 
8, CTP synthetase (pyrG) , carboxyl transferase beta subunit { 
accD) , ORF145 and ORF123 genes, complete cds . NID: gl399849 . 

15 atgctggaagcatcattaaaaagagaaactacaaaagtgtacactaatctaaaaccttgg 
gatcgtgttcaaatcgctcgt ttaccagaaagaccaaccacattagattatattccctat 
atttttgattcatttattgagttacatggcgatagaagttttagggatgatccagcaatg 
attggtggaattggttacttagatggtaagtctgtaacagttataggccaacaacgtggt 
aaagacacgaaagataatatttatcgtaattttggtatggctcacccagaagggtataga 

20 aaagctttgcgtttaatgaaacaagcagagaaatttaatcgtccaatatttacttttata 
gatactaaaggtgcttatccgggtaaagcagctgaagaaagaggtcaaagtgaatcaatt 
gcaaaaaatttgatggaaatggcttcattaacggtaccagttattgctgttgttattggt 
gaaggcggaagtggcggcgctttaggaattggaatctcaaatcgtgttctgatgcttgaa 
aatagtacttattcagttatttcacctgaaggagcagctgcacttttatggaaagatagt 

25 aacttagctcaaattgccgctgaaactatgaaaatcactgcgcatgatttactagattta 
ggtattatagatgaagtgattaatgagccacttggtggtgcgcaaaaagatgaagaagca 
caagctttatcaattaagaaaatgttccttaaacatttaaatgaattaaagcaactcaca 
cctgaagaattagcaaatgatcgttttgaaaaatttagaaaaattggttcagttgtggag 
tga 

30 

Sequence 3194 

MLEASLKRETTKVYTNLKPWDRVQIARLPERPTTLDYIPYIFDSFIELHGDRSFRDDPAM 
IGGIGYLDGKSVTVIGQQRGKDTKDNIYRNFGMAHPEGYRKALRLMKQAEKFNRPIFTFI 
DTKGAYPGKAAEERGQSESIAKNLMEMASLTVPVIAWIGEGGSGGALGIGISNRVLMLE 
35 NSTYSVISPEGAAALLWKDSNLAQIAAETMKITAHDLLDLGIIDEVINEPLGGAQKDEEA 
QAL S I KKMFLKHLNELKQLTPEELANDRF EKFRKIGS WE* 

Sequence 3195 
Contig_0714_pos_982_1668 

40 is similar to (with p- value 9.0e-54) 

>sp:sp|P54374|AROE_BACSU SHIKIMATE 5 -DEHYDROGENASE (EC 1.1. 
1.25). >gp:gp|D84432|BACJH642_93 Bacillus subtilis DNA, 283 
Kb region containing skin element. NID: g2627063. >gp:gp|Z99 
117 1 BSUB0014_46 Bacillus subtilis complete genome (section 1 

45 4 of 21): from 2599451 to 2812870. NID: g2634966. 

atgataggaattattggagcaatggaagaagaagtgacgattttaaagcgtaaattgaat 
gatatgaatgaaataaatattgcgcatgttaaattttatgttggcaagctaaaccacaaa 
gaggtggttttaacacaaagtggtataggtaaagttaatgcttctatctcaacgactttg 
ttaatagaaaaatttaatccagaagtcgtcattaatactggatcagcaggtgcactagat 

50 caaacactatctattggagatatattagtgagtaatcatgtattatatcatgatgctaat 
get acagcgtttggttatgaatatggacaaatacctcaaatgcctaaaacttatac tact 
gatcctactttgt tgaaaaaaacaatgcatgtattagaacaacaacaactgaatggtaaa 
gtaggtatgattgttagtggtgatagttttataggtagctcagaacagcgacaaaaaatt 
aagcaacaatttccagaagctatggctgtcgaaatggaggcaactgcaattgcgcaaaca 

55 tgttatcaatttaaagtaccatttatcgtaactagagctgtttctgatttagcaaacggt 
aaagccgatatttcttttgaagaatttttagataaagcagctttatcatctagtgagaca 
gtttcattattagtagaatcattataa 

Sequence 3196 
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MIGI IGAMEEEVTILKRKLNDMNEINI AHVKFYVGKLNHKEWLTQSGIGKVNASI STTL 
LIEKFNPEWINTGSAGALDQTLSIGDILVSNHVLYHDANATAFGYEYGQIPQMPKTYTT 
DPTLLKKTMHVLEQQQLNGKVGMIVSGDSFIGSSEQRQKIKQQFPEAMAVEMEATAIAQT 
CYQFKVPFIVTRAVSDLANGKADISFEEFLDKAALSSSETVSLLVESL* 

5 

Sequence 3197 
Contig_0714 _pos_33 23_4141 
is similar to (with p-value l,0e-57) 
>sp:sp|P24247|PFS_ECOLI PFS PROTEIN (P46) . >pir :pir | S45227 | 

10 S452 27 purine nucleoside phosphorylase homolog - Escherichia 
coli >gp :gp(D26562 I EC082K_47 Escherichia coli genome, 2.4-4 
.1 min region (110,917-193,643 bp from 0 min) . NID: g473770. 
>gp:gp|U70214 I ECU70214_10 Escherichia coli chromosome minut 
es 4-6. NID: gl552727. >gp : gp | AE000125 | AE000125_6 Escherichi 

15 a coli K-12 MG1655 section 15 of 400 of the complete genome. 
NID: gl786348. >gp: gp | U24438 | ECU24438_1 Escherichia coli MT 
A/SAH nucleosidase gene, complete cds . NID: g2981266. 
gtgataaaagtgaaatttgcagtaattggaaaccccatttctcattcattatcgccattg 
atgcatcatgctaattttcaatctttaaatttggaaaacacgtatgaagcgataaatgta 

20 ccagttaatcaatttcaagacattaaaaaaataatttcagaaaagagtattgatggattc 
aatgttactattccacataaagaacgtattattccgtacctagatgatattaatgaacaa 
gcgaaatctgttggggcggtaaatacagttttagttaaagatggtaagtggattggttat 
aatactgatggaattggttatgttaatggtttaaaacaaatatatgaaggtatagaagac 
gcttatatattaattttaggtgcaggtggagcaagtaaaggtatagcaaatgaattatat 

25 aaaatcgttcgtccgactttaacagttgcaaatagaacgatgtctcgttttaataattgg 
tcgttaaatattaacaaaataaatttaagccatgcagaaagccatttagatgaatttgat 
attataataaacactacacctgctggtatgaacggcaatacagattctgtaatttcttta 
aatcgtttagcttcacatactttagtaagtgatattgtttataatccatataaaacacca 
atactaatagaagctgaacaaagaggtaatccaatctataatggtcttgatatgttcgtt 

30 catcaaggtgctgaaagttttaaaatttggactaatctagaaccagatataaaagcaatg 
aaaaaca tagtaa t tcaaaaa t tgaaaggagaat ta tga 

Sequence 3198 

VIKVKFAVIGNPISHSLSPLMHHANFQSLNLENTYEAINVPVNQFQDIKKIISEKSIDGF 
35 NVTIPHKERIIPYLDDINEQAKSVGAVNTVLVKDGKWIGYNTDGIGYVNGLKQIYEGIED 
AYILILGAGGASKGIANELYKIVRPTLTVANRTMSRFNNWSLNINKINLSHAESHLDEFD 
IIINTTPAGMNGNTDSVISLNRLASHTLVSDIVYNPYKTPILIEAEQRGNPIYNGLDMFV 
HQGAESFKIWTNLEPDIKAMKNIVIQKLKGEL* 

40 Sequence 3199 

Contig_0717_pos_7 272_6145 

is similar to (with p-value l.Oe-99) 

>sp:Sp|P54524|YQIG_BACSU PROBABLE NADH- DEPENDENT FLAVIN OXI 
DOREDUCTASE YQIG (EC 1.-.-.-). >gp : gp | D84432 | BACJH642_230 Be 
45 cillus subtilis DNA, 283 Kb region containing skin element. 
NID: g2627063. >gp : gp | Z99116 | BSUB0013_132 Bacillus subtilis 
complete genome (section 13 of 21): from 2395261 to 2613730. 
NID: g2634723. 

atgaataataaacatgaacctttatttaaatctttaacactacctaatggtgttgaagta 
50 agaaatcgttttgttttagctcctcttacacatacttcatcaaatgatgatggaacaatt 
tcagatatagaattaccttacattgagaaacgttctaaagatgttgggattgcaattaat 
gcagcaagtaatgttaatgatgtaggcaaagcatttcctggtcaaccttctgttgcacat 
gactcagatatcgaagggttaaaagaacttgctcaagttatgaagaaaaatggtgcgaaa 
gcaatagtccaaattcatcatggaggtgctcaagctttaccagagttaacacctgatgga 
55 gatgttgttgcaccaagtgccatttctcttaaaagttttggtcagcaaaaagaacatgat 
gctcgtgagatgactgctgaagaaattgaacaaactattagagactttggtgaagcaact 
agaagagcaattgaagcaggttttgatggcgttgaaattcatggcgcaaaccattatctt 
attcaccaatttgtttctccttactataatagaagaaatgatgtttgggctgataactat 
aaattccctgttgctgttatagatgaagttgttaaagctaaaaaagctcatgcatatgat 



801 



wo 01/34809 



PCT/USOO/30782 



gattttattattggttacagattgtcacctgaagaagcggaatcaccaggtatttcaatg 
gagataactgaagaattaattcaccaaatcgcaaataaaccacttgattatattcatgtg 
tcattaatggatgttaactcagttacgcgagaaggtaaatataaaggtgaaaatcgcttg 
gaacttattcatcaatggataaatggacgtatgccgcttattggtataggttctgtcctt 
5 acagctgaagatgcactaaatgctgttgaaaacattggagttgaatttgttgcgttaggt 
tgtgaaattctacttgattatgattttgttgctaaaattaaagaaggtcgagaagacgaa 
attataaatgcttttgatcctaatcgtgaagaccaacattatctaacaccaaatctttgg 
gaacagtttaatcaaggattctatccattacctcgaaaagacaaataa 

10 Sequence 3200 

MNNKYEPLFKSLTLPNGVEVRNRFVLAPLTHTSSNDDGTISDIELPYIEKRSKDVGIAIN 
AASNVNDVGKAFPGQPSVAHDSDIEGLKELAQVMKKNGAKAIVQIHHGGAQALPELTPDG 
DWAPSAI SLKSFGQQKEHDAREMTAEEIEQTIRDFGEATRRAI EAGFDGVEIHGANHYL 
IHQFVSPYYNRRNDVWADNYKFPVAVIDEWKAKKAHAYDDFIIGYRLSPEEAESPGISM 

15 EITEELIHQIANKPLDYIHVSLMDVNSVTREGKYKGENRLELIHQWINGRMPLIGIGSVF 
TAEDALNAVENIGVEFVALGCEI LLDYDFVAKIKEGREDEI INAFDPNREDQHYLTPNLW 
EQFNQGFYPLPRKDK* 

Sequence 3201 

.20 Contig_0718_pos_2272_1247 

is similar to (with p-value 3,0e-29) 

>sp;sp|P31547|YAEE_ECOLI HYPOTHETICAL ABC TRANSPORTER PERME 
ASE PROTEIN YAEE. >gp : gp | D83536 | ECOTSF_23 Escherichia coli g 
enome, 4.0 - 6.0 min region. NID: gl208942. >gp: gp | U70214 | EC 

25 U70214_47 Escherichia coli chromosome minutes 4-6. NID: gl55 
2727. >gp:gptAE000129 |AE000129_2 Escherichia coli K-12 MG165 
5 section 19 of 400 of the complete genome. NID: gl786395. 
gtgattgaattcaaaaatgttaacaaagtttttcgcaaaaaaagagaaactattcaagct 
ttgaaaaatgtatcatttaagattgaccaacatgatatttttggtgttattggttatagt 

30 ggtgctggtaaaagtacattagttcggttagtcaatcaacttgagacagtatcagatggt 
caagttattgttgatggtcatgagattgatacatataaagaaaaagatttacgtgatatt 
aaaaaagatatcggtatgatctttcaacatttcaatttgcttaattctaaatcagtctac 
aaaaatgttgcaatgccacttattttaagtaagacaaataagaaagaaattaaggaaaaa 
gttgacgaaatgttagaatttgtggggcttgctgataaaaaagatcaatttccagatgaa 

35 ttatcaggtggacaaaaacaacgtgttgccatcgcaagagcattagtaacgcatcctaaa 
atattattatgtgatgaagcgacaagtgctctggatccagctactacaagctcaatttta 
aatttattaagtaatgtgaatcgaacatttggtgtgacgattatgatgattacacatgaa 
atgagcgtaattcaaaaaatttgtcatcgtgtagctgtcatggaaaatggcgaagtgata 
gaaatggggacagttaaagatgtctttagtcatccacaaacgaacactgcaaaaaatttc 

40 gtttcgacggtgattaacactgagccttcaaaagagttacgggcctcttttaactcgaga 
aaagattcaaatttcacagattataaactgtttttagactctgaacaaattcaattgcca 
atattgaacgagcttatcaacgagcatcatcttaacgttaacgtattattttcttctatg 
tcagaaattcaagatgaaacggtttgttatttgtggttgagatttgagcatgatgagtca 
tttaatgattttaaaicttactgattacctttcaaaacgacatattcggtatgaggaggtt 

45 atataa 

Sequence 3202 

VIEFKNVNKVFRKKRETIQALKNVSFKIDQHDIFGVIGYSGAGKSTLVRLVNQLETVSDG 
QVIVIX3HEIDTYKEKDLRDIKKDIGMIFQHFNLLNSKSVYKNVAMPLILSKTNKKEIKEK 
50 VDEMLEFVGLADKKEX3FPDELSGGQKQRVAIARALVTHPKILLCDEATSALDPATTSSIL 
NLLSNVNRTFGVTIMMITHEMSVIQKICHRVAVMENGEVIEMGTVKDVFSHPQTNTAKNF 
VSTVINTEPSKELRASFNSRKDSNFTDYKLFLDSEQIQLPILNELINEHHLNVNVLFSSM 
SEIQDETVCYLWLRFEHDESFNDFKLTDYLSKRHIRYEEVI* 

55 Sequence 3203 

Con t i g_0 7 1 B_pos_l 2 4 4_5 8 5 

is similar to (with p-value 6.0e-81) 

>sp:sp|P30750|ABC_ECOLI ATP-BINDING PROTEIN ABC. >gp:gp|U70 
214 |ECU70214_48 Escherichia coli chromosome minutes 4-6. NID 
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: gl552727. >gp: gp | AE000129 | AE000129_3 Escherichia coli K-12 
MG1655 section 19 of 400 of the complete genome. NID: gl786 
395. 

atgtttggttcaagtttagattcatctcaattattacaagctctatacgaaacattgtat 
5 atggtgactgtatcacttgtaatcggtgctttaataggtatacctcttggcatcttgtta 
gtggtaactagaaaaaacggtatatggtcgaatacaatattgcatcaagtattaaatcct 
atcattaatattttaagatcaattccgttcattattttattaatagccatagtgcctttt 
actaaattgctagttggcacatctatcggcacaacagcagccatagtaccactcacggtt 
tatgtagcaccttatatcgcacgcttagtagaaaactcattactggaagtcgatgacggg 
10 attattgaggcagctaaagcaatgggtgcatcacctcttcaaattatacgttatttttta 
ttgccagaagcacttggttcattaattctagctataactacagctattattggtctcata 
ggtagtacagcaatggctggtgctgttggtggtggcggtataggtgatttggctttagtg 
tatggttatcaacgattcgatacaattgtcattgtgattacagtcattgtacttattatt 
attgttcaaattatacaaacgttaggtaactttatcgctagggttatccgtagaaattaa 

15 

Sequence 3204 

MFGSSLDSSQLLQALYETLYMVTVSLVIGAIilGIPLGILLWTRKNGIWSNTILHQVLNP 
IINILRSIPFIILLIAIVPFTKLLVGTSIGTTAAIVPLTVYVAPYIARLVENSLLEVDDG 
IIEAAKAMGASPLQIIRYFLLPEALGSLILAITTAIIGLIGSTAMAGAVGGGGIGDLALV 
20 YGYQRFDTIVIVITVIVLIIIVQIIQTLGNFIARVIRRN* 

Sequence 3205 
Cont ig_0 7 1 9_pos_4 8 8 6_3 567 
>sp:Sp|P52673 |CYSI_THIRO SULFITE REDUCTASE (NADPH) HEMOPROT 
25 EIN BETA -COMPONENT (EC 1.8.1.2) (SIR-HP). >gp : gp | Z23169 | TRCY 
SC0MA_3 T.roseopersicina cysJ, cysl, cysH genes, complete CD 
S, and cysB gene 5' end. NID: gl518424. 

atgaaaaatattaatcatgcagtacttgattctattgctgcatgtggagatgttaatcgt 
aatacgatgtgcaatcctaatccttatcaatctcaagtacataaggagattaatgattat 

30 gcaacgcgtataagtaatcacttacttccaagaacaaatgcatatcatgaaatttggctt 
gatggtgaaaaggttttagattcgagtgaggaaaaggaacctatttatgggaatacgtat 
ttaccacgtaaattcaaaataggtattgcagtaccaccatctaatgatattgacgtctat 
tctcaagatattggtttaatcgctatcgttgaacaagatgagttaattggatttaatgtg 
actatcggtggcggtatgggtatgactcatggtaatactgaaacatatcctcaacttgga 

35 cgtctcataggttttatacctaaggaaaaggttgtagatgtatgtgagaaaatacttaca 
atacaacgtgattatggtaatcgtgaaaatcgaaaaaatgcacgttttaaatatacagtg 
gaccgtctaggagaaacttgggtgactgaagaattaaaccgacgattaggttgggaaatt 
aaagcgccacgtgatttcgaatttgaacataatggtgatcgattaggttggattgaaggt 
attaataattggaatttcactttatttatacaaaatgggcgtgtgaaagatactgaagac 

40 tatttgtt aaaaacaacc ttaagagaaatcgcagaaatccatactggagatttcagat ta 
tcacctaatcagaacttagttattgcaaatgtttctcctgagaaaaaggaagaaatacaa 
gctattattgataaatataaattaacagatggcaaaaattatacaggacttagaagaaat 
tctatggcttgtgttgctttcccaacgtgtggtttagctatggcagaatctgaaagatat 
cttccttcactaattacaaaaattgaagatttattagatgagtctggtttaaaagaggaa 

45 gaaataacgattcgtatgacaggttgtcccaatggatgtgcgagaccagcgctagcagaa 
atagcctttatcggtaaagcacctggtaaatataatatgtacttaggtggtagttttaaa 
ggcgaacgtctaaataaaatatataaagagaatatcgacgaaaatgagatattagaaagt 
ctacgtccattgttgttgcgttatagtaaagagcgtcttgacggagaacactttggggac 
tttgtaattcgtgacggtgtgatagccaaagttcatgatggtcgcgattttcatagttaa 

50 

Sequence 3206 

MKNINHAVLDSIAACGDVNRNTMCNPNPYQSQVHKEINDYATRISNHLLPRTNAYHEIWL 
DGEKVLDSSEEKEPIYGNTYLPRKFKIGIAVPPSNDIDVYSQDIGLIAIVEQDELIGFNV 
TIGGGMGMTHGNTETYPQLGRLIGFIPKEKWDVCEKILTIQRDYGNRENRKNARFKYTV 
55 DRLGETWVTEELNRRLGWEIKAPRDFEFEHNGDRLGWIEGINNWNPTLFIQNGRVKDTED 
YLLKTTLREIAEIHTGDFRLSPNQNLVIANVSPEKKEEIQAIIDKYKLTDGKNYTGLRRN 
SMACVAFPTCGLAMAESERYLPSLITKIEDLLDESGLKEEEITIRMTGCPNGCARPALAE 
lAFIGKAPGKYNMYLGGSFKGERLNKIYKENIDENEILESLRPLLLRYSKERLDGEHFGD 
FVIRDGVIAKVHDGRDFHS* 
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Sequence 3207 

Contig_0728_pos_9983_10327 
>gp'gp|AF043386 |AF043386_2 Clostridium acetobutylicum glyce 
5 raldehyde-3 -phosphate dehydrogenase (gap) , phosphoglycerate 
kinase (pgk) , and tr iosephosphate isomerase (tpi) genes, com 
plete cds; and 2 , 3-bpg -independent phosphoglycerate mutase ( 
pgm-i) gene, partial cds, NID: g2829136. 

atgcagattgaagaaaccttccgagacttgaaaagtcctgcctacggactaggcctacgc 
10 catagccgaacgagcagctcagagcgttttgatatcatgctgctaatcgccctgatgctt 
caactaacatgttggcttgcgggcgttcatgctcagaaacaaggttgggacaagcacttc 

caggctaacacagtcagaaatcgaaacgtactctcaacagttcgcttaggcatggaagtt 
ttgcggcattctggctacacaataacaagggaagactcactcgtggctgcaaccctgctt 
actcaaaatctattcacacatggttacgttttggggaaattatga 

15 

Sequence 3208 

MQIEETFRDLKSPAYGLGLRHSRTSSSERFDIMLLIALMLQLTCWLAGVHAQKQGWDKHF 
QANTVRNRim.STVRLGMEVLRHSGYTITREDSLVAATLLTQNLFTHGYVLGKL* 

20 Sequence 3209 

Contig_0728_pos_8652_7642 

is similar to (with p-value 4.0e-59) 
>pir:pir 1 167760 1 167760 transposase (insertion sequence ISIO 

) - Escherichia coli >gp : gp | S67119 | S67119_2 BST=soma to tropin 
25 . . .BST/beta-Gal fusion protein [Escherichia coli, LBB84, pla 

smid pXT107, ISlOL/R-1, PlasmidTransposonlnsertionMutant , 3 

genes, 1679 nt) . NID: g455674. 

atggcaattaaagtagcaattaatggttttggtagaattggtcgtttagcattcagaaga 
attcaagatgtagaaggtcttgaagtagttgcagttaacgacttaacagatgacgatatg 

30 ttagctcatttattaaaatacgatactatgcaaggtcgtttcactggagaagttgaagtt 
atcgaaggtggattccgtgtgaacggtaaagaaattaaatcattcgatgaaccagatgct 
ggtaaattaccatggggcgatttagatatcgacgtagtattagaatgtactggtttctat 
actgataaagaaaaagcacaagctcacatcgatgcaggtgctaaaaaagtattaatctca 
gctccagctaaaggtgatgtaaaaacaatcgtattcaacactaaccatgatacattagat 

35 ggttcagaaacagttgtttcaggtgcttcttgtactactaactcattagcaccagttgca 
aaagttttaag tgacgaattcggtttagttgaaggtttcatgactacaattcacgcttac 
actggtgaccaaaatacacaagacgcacctcacagaaaaggtgacaaacgtcgtgcacgt 
gcagcagctgaaaatattatccctaactcaacaggtgctgctaaagctatcggtaaagtt 
attccagaaatcgatggtaaattagacggtggagcacaacgtgttccagttgctactggt 

40 tctttaactgaattaactgtagtattagacaaacaagatgtaactgttgaccaagttaac 
agtgctatgaaacaagcttctgacgaatcattcggttacactgaagacgaaatcgtatct 
tctgatattgttggtatgacttacggttcattatttgatgcgactcaaactcgtgttatg 
actgttggagatcgtcaattagttaaagttgcagcttggtacgacaatgaaatgtcttac 
actgctcaattagtacgtacattagctcacttagctgaactttctaaataa 

45 

Sequence 3210 

MAIKVAINGFGRIGRLAFRRIQDVEGLEWAVNDLTDDDMLAHLLKYDTMQGRFTGEVEV 
lEGGFRVNGKEIKSFDEPDAGKLPWGDLDIDVVLECTGFYTDKEKAQAHIDAGTOCKVLIS 
APAKGDVKTIVFNTNHOTLIX3SETWSGASCTTNSLAPVAKVLSDEFGLVEGFMTTIHAY 
50 TGDQNTQDAPHRKGDKRRARAAAENIIPNSTGAAKAIGKVIPEIDGKIiDGGAQRVPVATG 
SLTELTWLDKQDVTVDQVNSAMKQASDESFGYTEDEIVSSDIVGMTYGSLFDATQTRVM 
TVGDRQLVKVAAWYDNEMSYTAQLVRTLAHLAELSK* 

Sequence 3211 
55 Cent ig_0 7 3 2_pos_l 1 8 6_2 424 

>gp:gp|AF043609 I AF043609_1 Arthrobacter viscosus aluminum r 
esistance protein (Alu-2) gene, complete cds. NID: g2827438. 

atgcaagattttagcaatttagttgaagaagttgaaaacacacttat tec t tact ttaga 
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aaaattgaaaagcgtgcattatttaatcaggaaaaggtcttaaatgcttttcaccatgtt 
aaagctagcgaaagtgatttacaggggtctacgggttatggatatgatgattttgggaga 
gaccatttagaacaaatttatgcgcacacatttaaagcagatgacgcacttgtaagacct 
caaattatttcaggtactcatgctattactttagctttacaaagtacgttaaaaaacaat 
5 gatgaactactttatattacaggtagtccatatgatacacttctagaagtcattggtata 
aatggcaatggtgttgaaagtcttaaagaatatggtgttcgctataatgaagtcgaatta 
cgtgacggtcgaattgatattcctaaagtcatcactgcaattaatgacaatacaaaagtt 
gtagcaattcaacgatcaaaaggatatgatcaacgtccatcaattacaattaatgaaatt 
gaacaagcaataacatc tat taaagaggtttatcccaata teat tatttttgttgataat 

10 tgttatggagaatttgtagaagataaagaaccgattgaagtaggtgctgatttaatcgcc 
ggatcattaattaaaaatccaggtggaggtttagctaaaattggaggatatattgctggt 
agacaagacttaattgaacgctgtggttatcgtttaacagcaccaggcattggtaaggaa 
gcaggagcctcacttaattctttacaagaaatgtatcaaggattctttctagcgccacat 
gtggttagccaaagtttaaaaggtgcactgtttactagtttgttattagaaaaaataaac 

15 atgaagacctcccctaaatataatgtttatcgtacagacttaattcaaacggttcaattt 
gagaccaaagagcaaatgatttcattttgccaaagtatacaacacgcttcaccaattaac 
gcacattttagtccagaacctagctatatgcctggatacgaagatgatgtcatcatggct 
gcaggtacatttattcagggctcgtctattgaattatccgcagacggacctatacgtccg 
ccttatgaagcatatgttcaaggtggtttaacttatgaacatgtcaaattagctgttaca 

20 cgtgcggtgcaacatatgcaagaaaacaatttactataa 

Sequence 3212 

MQDFSNLVEEVENTLIPYFRKIEECRALFNQEKVLNAFHHVKASESDLQGSTGYGYDDFGR 
DHLEQIYAHTFKADDALVRPQIISGTHAITLAt^STLKl^roELLYITGSPyDTLLEVIGI 
NGNGVESLKEYGVRYNEVELRDGRIDIPKVITAINDNTKWAIQRSKGYDQRPSITINEI 
EQAITSIKEVYPNIIIFVDNCYGEFVEDKEPIEVGADLIAGSLIKNPGGGLAKIGGYIAG 
RQDLIERCGYRLTAPGIGKEAGASLNSLQEMYQGFFLAPHWSQSLKGALFTSLLLEKIN 
MKTSPKYNVYRTDLIQTVQFETKEQMISFCQSIQHASPINAHFSPEPSYMPGYEDDVIMA 
AGTFIQGSSIELSADGPIRPPYEAYVQGGLTYEHVKLAVTRAVQHMQENNLL* 

Sequence 3213 
Cont ig_0 7 3 2_pos_2 9 0 5_3 270 
is similar to (with p-value 5.0e-55) 

>gp:gp|X76490 |SAGLNAR_2 S . aureus (bb270) glnA and glnR gene 
S. NID: gll34885, 

atgtctaatgattcaatcagacgaaacatggccgttttctctatgagtgtggttagtaaa 
ttgacagatttatcaccaagacaaattcgttactatgaaacacatgaacttgtgatgcct 
gaaagaacagatggaaataagagattattttctatgaacgatttagagaggttgttagaa 
ataaagtctcttatcgaaaagggatttaatattagaggtattaaacaaattatattcgat 
gagcaagggca 1 1 1 aac t ac tga tgaacaagagacaagaaagagaatga t tgt tgacgca 
acgcagaaaccacgtagtgaaacattaccaataaatcgcggcgatttatctcgatttatt 
aaatga 

Sequence 3214 

45 MSNDSIRRNMAVFSMSWSKLTDLSPRQIRYYETHELVMPERTDGNKRLFSMNDLERLLE 
IKSLIEKGFNIRGIKQIIFDEQGHLTTDEQETRKRMIVDATQKPRSETLPINRGDLSRFI 
K* 

Sequence 3215 
Contig_0733_j>os_1000_1344 
is similar to (with p-value 2.0e-46) 

>sp:Sp|P94453 |ALF_BACST FRUCTOSE- BISPHOSPHATE ALDOLASE (EC 
4.1.2.13) (FRAGMENT). 

atgaaagaaatgttaatcgatgcgaaagaaaacggttatgcggttggtcaatacaatctt 
aataacctcgaatttacacaagctattttagaagcgtctcaagaagagaatgcgccagtt 
attttaggtgtttctgaaggggcagctcgttatatgagtggtttttatacagttgtgaaa 
atggtagaaggtttaatgcatgacttaaacatcacaatcccagtagcaattcatttagac 
cacggttcaagctttgaaaaatgtaaagaagcaattgatgctggattcacatctgtaatg 
attgatgcatctcatagtccttttgaagaaaatgttgaaatatag 
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Sequence 3216 

MKEMLIDAKENGYAVGQYNLNNLEFTQAILEASQEENAPVILGVSEGAARYMSGFYTWK 
MVEGLMHDLNITIPVAIHLDHGSSFEKCKEAIDAGFTSVMIDASHSPFEENVEI* 

5 

Sequence 3217 

Contig_0733_pos_12244_11498 
is similar to (with p-value 4.0e-40) 
>sp:sp|P55476|NODI_RHISN NODULATION ATP-BINDING PROTEIN I. 
10 >gp:gp|AE000076|AE000076_5 Rhizobium sp . NGR234 plasmid pNGR 
234a, section 13 of 46 of the complete plasmid sequence. NID 
: 92182419. 

gcgagtggtcactcattttatcggtgtgaggaggaacaatttatgatagaggtaaaaaat 
gtaagtaaatcctttggtaaacaacaagtggtagatgatatatctatatcatttaatgct 
15 ggtgaggttgtgggtctaatcggtccttcaggtactggaaaaacgacattaatacagtgc 
attttaggcatggagaaaattgatggtgggcaagtcactattcaagaacatacaatgccg 

aatagaaagatattatcaaatattggttatatggctcaaaatgatgctttatataatgat 
ttaactggacgtgaaaatttaacgtttttcgcaagaatttatatgcgtgataaagaagat 
attaaaaaacgtgtgaacctatgcagttccatggttcaattagacaatgatttagataag 

20 aaagttgaaatgtattctggtggaatgaaacgacgcttatctttagctattagcttttta 
caaaatcctaatatccttatattagacgaacctacagttggcattgatcctaaattgcgt 
cagacgatttggaaggatttaactaaagcaaaggctgaagataaatgcattatagtgaca 
acgcatgtattagatgaagctacacgctgtgataagctcgtattaatgaatcaaggaaag 
atattggcaacgggtacaccagatgaagtgaaaaaacaatatcacacagatacgattgaa 

25 ggcgtatttctgaatatggagggataa 

Sequence 3218 

VSGHSFYRCEEEQFMIEVKNVSKSFGKQQVVDDISISFNAGEWGLIGPSGTGKTTL.IQC 
ILGMEKIDGGQVTIQEHTMPNRKILSNIGYMAQNDALYNDLTGRENLTFFARIYMRDKED 
30 I KKRVNLCS SMVQLDNDLDKKVEMYSGGMKRRLSLAI SFLQNPNI L I LDE PTVG I DPKLR 
QTIWKDLTKAKAEDKCI I VTTHVLDEATRCDKLVLMNQGKILATGTPDEVKKQYHTDTI E 
GVFLNMEG* 

Sequence 3219 
35 Contig_073 3_pos_9564_7 870 

is similar to (with p-value 7.0e-31) 

>gp:gp|AB009866|AB009866_7 Bacteriophage phi PVL proviral D 
NA, complete sequence. NID: g3341907. 

atggcaaagattagggattatgttacagaatatgcaaaaaaagtagttaatggcgatatt 

40 atagctagtaaaaaaaacgtgaaagcctgtcaacgccatttagatgacttgaacgattcg 
gaactccctta teat tttgatgtaaagaaagctaatcacat tat taagtttcttgaaatg 
ttgccagatcctaaaactggtaaacaattatcgttaggcggttttcaaaaattcattgct 
ggtagcttaaatggttggtacgacagacatgggtacaaaagatttacaaaagcctatata 
tcaatgagcagaaaaaatggtaaaacattattgatctctggaatggcattgtacgattta 

45 ttgatgggtaaagatccgttgaatgaacggttgattggtttgagcgccaattcaagagac 
caagctggtatagcatacgatatgacattggcacaactgaaagctattagaagcgtttct 
cctaaggttaaatcgatgactaagataacgccaagtgcaaaagaaatattgaatattaat 
gatcgaagtaaagttaaagccgtttcaaatgaagctgcaaatttagaaggtcatcagttt 
agctacgcaatcatcgatgaatatcatgaagctaaagataaaaagatttatgaaacgtta 

50 agacgtgggcaagtgctactgcacaaccctatattaattattatctcaacagctggaact 
aatttgaatggtccgatgtatgaagaatatttatatattgataagatacttgacggcata 
gcaaaaaatgaaaactactttgttttctgtgctgaacaagatgatgagaaagaagtatat 
gacgttaaaacttggattaaatccaatccacttatggagttgccagaaatggcacaattg 
ttaactaagaatattcaaccagaagttaaaactgcaattgatagtggttcaggattaaat 

55 gggatattaataaagaatttcaatatgtggcgtgcagcaagcacagaatcttatttagat 
ttcaatgattggaagaaaaatgaaatagactttgatataaatggctctaaaacttatatc 
ggtttagacttatcgcgtgctgacgacttaaccgcagtatcgtttgttcatcttgatgaa 
gataatcaagagtattatgtaactagtcattcgtttgtggctactaaaggtggattagat 
ggcaagattgatagagactttattgattacagacaacttgcagaaagtggttattgtacg 
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attaccgatttacaaagtggaattatcaatactgaccaagttttaaattacattgagaat 
tatatcgaccaatataaattagacgtacaagcgttatgttatgatccttactcaatacat 
ggtgttattgcagaaattgagcgtagagattggccttatgatttagtagaaatcagacaa 
gggccacaaacactatctaatccgatactggattttagactgaaagtgattaatggggac 
5 atcaagcatcataaaaatccgttactagacattgcagtcaaaaatgctgtggcaaaagat 
accaatgactcattaatgattgaaaagaagatgaaccgagaaaaaatagatccactcatg 
gctaccatatttgcttatgttatggcttgtgaacatgaatgggacacagaaactttaatg 
ccattgttcttatag 

10 Sequence 3220 

MAKIRDYVTEYAKKVVNGDI I ASKKNVKACQRHLDDLNDSELPYHFDVKKANHI IKFLEM 
LPDPKTGKQLSLGGFQKFIAGSLNGWDRHGYKRFTKAYISMSRKNGKTLLISGMALYDL 
LMGKDPLNERLIGLSANSRDQAGIAYDMTLAQLKAIRSVSPKVKSMTKITPSAKEILNIN 
DRSKVKAVSNEAANLEGHQFSYAIIDEYHEAKDKKIYETLRRGQVLLHNPILI I ISTAGT 

1 5 NLNGPMYEEYLYIDKILDGIAKNENYFVFCAEQDDEKEVYDVKTWIKSNPLMELPEMAQL 
tiTKNIQPEVKTAIDSGSGLNGILIKNFNMWRAASTESYLDFNDWKKNEIDFDINGSKTYI 
GLDLSRADDLTAVSFVHLDEDNQEYYVTSHSFVATKGGLDGKIDRDFIDYRQLAESGYCT 
ITDLQSGIINTDQVLNYIENYIDQYKLDVQALCYDPYSIHGVIAEIERRDWPYDLVEIRQ 
GPQTLSNPILDFRLKVINGDIKHHKNPLLDIAVKNAVAKDTNDSLMIEKKMNREKIDPLM 

20 ATIFAYVMACEHEWDTETLMPLFL* 

Sequence 3221 
Contig_0733_pos_7622_6432 
is similar to (with p-value 2,0e-16) 
25 >gp:gp| AB009866 I AB009866_5 Bacteriophage phi PVL proviral D 
NA, complete sequence. NID: g3341907. 

atgaacagagatttagaacgattattgtattggcaagaacatggcacacatgcaagctat 
gttggtataaacgcgctacgtaacagtgatgtatttactgctacacgtattatatctgca 
gacattgcaagtaccaagttgaaagttaaaggtcacgaaacaaatacagtgatggaccaa 

30 atactggatctatttaataacaatccgtattcggacttaccgggttggcactttaagttt 
ataatcatcgcgaatatgctgcttaacggtcaatcttttgttgaaattgtgcgtggcaaa 
aatgattttcctgttggattccacttcttacataacgacttagtaggaattgaggaaaaa 
gacggcgaaattatttacaacgtaagtgaagatgtggaaggtaatgccgttaagataaca 
agcgatgatatattacatttcagatatatcacattagatggatatataggatacagtccg 

35 ttgtatgcactagcacatgagattggtatttctcaaggctctaagagcttcctgcgtaac 
ttcttcgataatggtgggacttcgacatcagtattgaagtatagaaaagggcaaatcaat 
gctgaacaattaagagaattgaaaaagaacttttcagaaagtcaattaaaaaacaacggt 
ggtttagttgctatcgatgacacaatggaatttaacagactacaaattcctaccgaagta 
ttgaacttcttaaatagttataagttcagcacatctcaagttgctaaagcgttcggtttg 

40 ccggtatctaaactaggtattgaaacagtcaatacatctatcacacaagcaaacttagag 
tatttgcaaagtacattagatccaatatttaaaatgatgattgctgaactcgaaacgaaa 
atatttaaatttattgattctggtaacgaattagagtttgactcatcacgtctcattgac 
attgatccagagttacaattacaacgtattactgaattgcatagtaaaggaattatttca 
acagacgaagctagaagtgtatttggctatcaacctattgaacatggcgagcaaccattg 

45 gttgatcttaacagagcgccacttaacactttagaaaattaccaaaaatcgaaaattgac 
aaagaag t cgaaaaga ac t oca t taaaggggg tga t gagt a tgacgaa tag 

Sequence 3222 

Ml^DLERLLYWQEHGTHASYVGINALRNSDVFTATRIISADIASTKIiKVKGHETNTVMDQ 
50 ILDLFNNNPYSDLPGWHFKFIIIANMLLNGQSFVEIVRGKNDFPVGFHFLHNDLVGIEEK 
DGEIIYNVSEDVEGNAVKITSDDILHFRYITLDGYIGYSPLYALAHEIGISQGSKSFLRN 
FFDNGGTSTSVLKYRKGQINAEQLRELKKNFSESQLKNNGGLVAIDDTMEFNRLQIPTEV 
LNFLNSYKFSTSQVAKAFGLPVSKLGIETVNTSITQANLEYLQSTLDPIFKMMIAELETK 
IFKFIDSGNELEFDSSRLIDIDPELQLQRITELHSKGirSTDEARSVFGYQPIEHGEQPL 
55 VDLNRAPLNTLENYQKSKIDKEVEKNS IKGGDEYDB* 

Sequence 3223 

Contig_0733_pos_6406_5885 

is similar to (with p-value 2.0e-46) 
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>gp:gp|AB009866 1 AB009866_4 Bacteriophage phi PVL proviral D 
NA, complete sequence. NID: g3 341907. 

atggttgtcgaaggttacgccattatctttaattcaatgagtgatgatttgggtggattt 
agagaaattgtagcgcctaatgctttaaatgatgtagatgtaagtgatgtgaaatgtcta 
5 atcaatcatgattttagttatgttataggacgcacacaagcaggaacgcttgagctacag 
gtggatgaaaaagggctatactttaaatgccacttacctaatacatcatacgcaagagat 
atttatgagaatattaaagcaggcaacgttaatcagtgcagtttcttttacacattgcca 
cctaatgactcaacggctcgtacgtggcaaaacatagataatgagtacgttcaaaccata 
aataaaatcgatgaattgattgaggttagtattgttacagtgccagcctacaaagataca 
10 tcggttgaagtcggtcaacgtgcgaaagacttaaagaaattcaaacagttggaacaaatg 
aagatagcattggatttagaaagcctacgttttgaaacgtaa 

Sequence 3224 

MWEGYAIIFNSMSDDLGGFREIVAPNALNDVDVSDVKCLINHDFSYVIGRTQAGTLELQ 
15 VDEKGLYFKCHLPNTSYARDIYENIKAGNVNQCSFFYTLPPNDSTARTWQNIDNEYVQTI 
NKIDELIEVSIVTVPAYKDTSVEVGQRAKDLKKFKQLEQMKIALDLESLRFET* 

Sequence 3225 

Contig_0733 _pos_5844_4492 
20 >gp:gp|X97563 I BPHA3GP3_5 Bacteriophage A2 gp3 gene and 4 op 
en reading frames. NID: gl523807. 

atggctaatttagatgagcgcaaaaaagaaatcgctaatctgatttctaaagcgcaagaa 
gcagtcgaaaaaggcgacctcgaaactgctcgtaatttaaaagctgatattgatgctcaa 
aagaaagag tacgaagaac tcgaacagc 1 1 tcaaaagaaa t tgaagcg tcagcacc taaa 

25 caagatgaaccacctaaagatgaaggtgcagaagttgaagataacaaagatggtaattct 
ggagaagaatcagagaacaaaccttctgatgatgaaccagaaggaacttcagatgaagaa 
aaacctgatgatgcaccaaaaccagatgacaaacctgaagaaacaccagaaacacctact 
attgaaaaagtagaagaaccaacagaagaagaattaaaaaaagaaaaagacaaaaaagaa 
ggagcgaaacgttctatggctaaattaaaccaaaatccagagacaaacgaagaaattcta 

30 gcatttgaacagtacatgaaatcaaaaggggctaaacgtgacaatgttaaatctgatgac 
gttggcgtaactatcccagaggatattaaatatattcctgaaaaagaagttaagacagtc 
caagacttatcagaattggtacaaaaaacttcagtatcaactgcaagtgggaaatacccg 
atcttaaaacgtgctaacgctaaattcaacactgttgctgaattagagaaaaaccctgag 
ttagctcgtccggaattcgaaacaatcaattgggaagtagacacttatcgtggatctatt 

35 ccgatttcacaagaagcattagatgattcagttgctaacttaactgctattgtttctgaa 
aatattaacgaacaaaaaatcaacactttaaatgaacgtattggtgaagttttaaaagca 
ttcaatcctactagtgtttctaatgttgacgacttaaaagaaattatcaacgttaaatta 
gatcctggttatgaccgccaaattatctgtactcaaagtttctatcaaaaactagataca 
ttaaaagatggtaacggtcgttatttactacaagacagtatcatcaacactgcaggtaac 

40 actgtgttaggtatgaatgtaacagttgtgcgtgatgacttgttaggtaaaaatggagat 
gcattagcatttattggtgatgtaaaacgcggtgtgttatttgcagaccgtacagacgtt 
tctgttcaatggattgaaaatgaaatctacggtaaatacttaatgggtgctttccgtttc 
gatgtgaaacaggctgataaaaatgctggtttcttcgtaacatttgaagagcgtttatat 
tacttcatattgggcaatggatgtatacgatga 

45 

Sequence 3226 

MANLDERKKEIANLISKAQEAVEKGDLETARNLKADIDAQKKEYEELEQLSKEIEASAPK 
QDEPPKDEGAEVEDNKDGNSGEESENKPSDDEPEGTSDEEKPDDAPKPDDKPEETPETPT 
lEKVEEPTEEBtiKKEKDKKEGAKRSMAKLNQNPETNEEILAFEQYMKSKGAKRDNVKSDD 
50 VGVTIPEDIKYIPEKEVKTVQDLSELVQKTSVSTASGKYPILKRANAKFNTVAELEKNPE 
LARPEFETINWEVDTYRGSIPISQEALDDSVANLTAIVSENINEQKINTLNERIGEVLKA 
FNPTSVSNVDDLKEIINVKLDPGYDRQIICTQSFYQKLDTLKDGNGRYLLQDSIINTAGN 
TVLGMNVTVVRDDLLGKNGDALAFIGDVKRGVLFADRTDVSVQWIENEIYGKYLMGAFRF 
DVKQADKNAGFFVTFEERLYYFILGNGCIR* 

55 

Sequence 3'227 
Con t i g_07 3 4_pos_2 7 0 1_2 249 
is similar to (with p-value 9.0e-25) 
>sp:sp| P00937 |TRPG_YEAST ANTHRANILATE SYNTHASE COMPONENT II 
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(EC 4.1,3.27) (CONTAINS: GLUTAMINE AMIDOTRANSFERASE; INDOLE 
-B-GLYCEROL PHOSPHATE SYNTHASE (EC 4.1.1.48) (PRAI) ) . >pir:p 
ir I S3 8049 |NNBY2 anthranilate synthase multifunctional enzyme 
- yeast (Sac char omyces cerevisiae) >gp:gp|X75951 | SC60RF_9 S 
5 -cerevisiae URAl, SACl, RSDl and TRP3 genes and 6 new orfs. 
NID: g473130. >gp: gp | Z28211 | SCYKL211C_1 S. cerevisiae chromos 
ome XI reading frame ORF YKL211c. NID: g486376. 
gtggtggtaatacatctacatttattgcattatcgtcagtatggacaagctggtgcatct 
attattttattaatagtaaatattttaagtgatgaccaattaaaagaattgtattcatat 

10 gcaacaaaccataatttagaagctctagtagaagttcatacaattagagaacttgaacgt 
gcacaccaaattaaccctaaaattattggtgttaataatcgtgatttaaaacgatttgaa 
accgatgttctacatacaaataaattacttaagtttaaaaagtctaattgctgctacatt 
tcagagagtggcattcatacaaaagaagatgttgagaaaatagtagattcaagtattgac 
ggtttacttgtaggggaggcattaatgaaaacaaatgacttaagtcagtttttgcctagt 

15 ttaaagttaaagaagaatctctatgatagttaa 

Sequence 322 8 

VWIHLHLLHYRQYGQAGASIILLIVNILSDDQLKELYSYATNHNLEALVEVHTIRELER 
AHQINPKIIGVNNRDLKRFETDVLHTNKLLKFKKSNCCYISESGIHTKEDVEKIVDSSID 
20 GLLVGEALMKTNDLSQFLPSLKLKKNLYDS * 

Sequence 3229 
Contig_0737_pos_1135_1560 
is similar to (with p-value 2.0e-19) 
25 >gp:gp |aF001974 I AF001974_1 Thermoanaerobacter ethanolicus p 

utative TrkG gene, partial cds , and putative TrkA, xylose is 
omerase (xylA) and xylulose kinase (xylB) genes, complete cd 
s. NID: g2581794. 

gtgagaccgactgtaccgaatgctgatacgacttcgaagagtattttgattaaaggtata 
30 ttggaattgattatggtaagtataaacgtaaccataccgataaatgcaatagaaatgaga 
atagttacgaaagacaactgtatgtatctttctgatatttctctattaaatatagagttg 
tttttttctttgcgtatcgtattaaaaatagcgattgttgcaataacaaaagttgtaacc 

tttataccacctgcagcactcaatggtgcacctccaataaacatgagagccataagtaat 
aaagctgtcggtgttttaatgtttccaacgtcaattgtgttaaatcctgcagtccttgtt 
35 gtcactgattggaaaaatgcatttcctattttttcaattaatcccatgtgtaacatagag 
ttttga 

Sequence 3230 

VRPTVPNADTTSKSILIKGILELIMVSINVTIPINAIEMRIVTKDNCMYLSDISLLNIEL 
40 FFSLRIVLKIAIVAITKWTFIPPAALNGAPPINMRAISNKAVGVLMFPTSIVLNPAVLV 
VTDWKNAFPIFSINPMCNIEF* 

Sequence 3231 
Contig_0737_pos_1585_2199 

45 is similar to (with p-value 3.0e-19) 

>sp:sp|P43440|NTPJ_ENTHR V-TYPE SODIUM ATP SYNTHASE SUBUNIT 
J (EC 3.6.1.34) (NA(+)- TRANSLOCATING ATPASE SUBUNIT J). >g 
p:gp|D17462 |ENENTP_11 Enterococcus hirae ntp genes for Na+ - 
ATPase subunits, complete cds . NID: g487271. 

50 gtgcctatcaaaattaataatccagtggtagttaacacgagtttagagtgtaaggataat 
tttctaaaacttttggcgttccacaaatcaaccacgactaaatgtcccaaacctcccaaa 
atgataag tat tggaatagtgataatgattaccgga teat ttgaaaaatcgattaagttg 
tttttaaaaagggcgaatcctgcgttgttaaatgcggaaactgaagtgaataaacttaaa 
aatagacctttacctatgccaaattttggaataaacgataaacatagacaaagtgtacca 

55 aataattcagtggcgatgctgtatatggctagatgtttaataagtttaattacaccaccg 
ggttcgtcaatattccaagtaatcataaataaaattctattgttaattgatattttctta 
tttaaaaagatgagagttagcattgctacagtgacaatacctaatccaccaatttgtata 
agtaataagataatgatttctccaaaaatattaaattgtgttccaacatcaactggtgat 
agacctgttactgtgaatgcgctagaagctacaaacaatgcgtcaataaagttaataggt 
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ttcttccctgtatag 
Sequence 3232 

VPIKINNPVWNTSLECKDNFLKLLAFHKSTTTKCPKPPKMISIGIVIMITGSFEKSIKL 
5 FLKRANPALLNAETEVNKLKNRPLPMPNFGINDKHRQSVPNNSVAMLYMARCLISLITPP 
GSSIFQVIINKILLLIDIFLFKKMRVSIATVTIPNPPICISNKIMISPKILNCVPTSTGD 
RPVTVNALEATNNASIKLIGFFPV* 

Sequence 3233 
10 Contig_0737_pos«2249_978 

is similar to (with p-value l.Oe-54) 

>gp:gp|D89592 |D89592_3 Vibrio alginolyt icus rhlE, KtrA and 
KtrB genes, complete cds . NID: g3927863. 

atgttatttctgttgacaactttaattggtgcttttctactctatttgccctatacaggg 

15 aagaaacctattaactttattgacgcattgtttgtagcttctagcgcattcacagtaaca 
ggtctatcaccagttgatgttggaacacaatttaatatttttggagaaatcattatctta 
ttacttatacaaattggtggattaggtattgtcactgtagcaatgctaactctcatcttt 
ttaaataagaaaatatcaattaacaatagaattt tat ttatgat tact tggaatattgac 
gaacccggtggtgtaattaaacttattaaacatctagccatatacagcatcgccactgaa 

20 ttatttggtacactttgtctatgtttatcgtttattccaaaatttggcataggtaaaggt 
ctatttttaagtttattcacttcagtttccgcatttaacaacgcaggattcgcccttttt 
aaaaacaacttaatcgatttttcaaatgatccggtaatcattatcactattccaatactt 
atcattttgggaggtttgggacatttagtcgtggttgatttgtggaacgccaaaagtttt 
agaaaattatccttacactctaaactcgtgttaactaccactggattattaattttgata 

25 ggcacggttttcttctttttactagaaaatcaaaactctatgttacacatgggattaatt 
gaaaaaataggaaatgcatttttccaatcagtgacaacaaggactgcaggatttaacaca 
attgacgttggaaacattaaaacaccgacagctttattacttatggctctcatgtttatt 
ggaggtgcaccattgagtgctgcaggtggtataaaggttacaacttttgttattgcaaca 
atcgctatttttaatacgatacgcaaagaaaaaaacaactctatatttaatagagaaata 

30 tcagaaagatacatacagttgtctttcgtaactattc teat ttc tat tgcatttatcggt 
atggttacgtttatacttaccataatcaattccaatatacctttaatcaaaatactcttc 
gaagtcgtatcagcattcggtacagtcggtctcactatggatttaacttccgaatactat 
aattggactgagtttattattatcatcgtaatgttatgtggtaaaattggattactgaat 
attagtagagcgcttgttccacctaaagaccctaaaaattatagatataccaaaggacac 

35 attcacttataa 

Sequence 323 4 

MLFLLTTLIGAFLLYLPYTGKKPINFIDALFVASSAFTVTGLSPVDVGTQFNIFGEIIIL 
LLIQIGGLGIVTVAMLTLIFLNKKISINNRILFMITWNIDEPGGVIKLIKHLAIYSIATE 

40 LFGTLCLCLSFIPKFGIGKGLFLSLFTSVSAFNNAGFALPKNNLIDFSNDPVIIITIPIL 
IILGGLGHLVWDLWNAKSFRKLSLHSKLVLTTTGLLILIGTVFPFLLENQNSMLHMGLI 
EKIGNAFFQSVTTRTAGFNTIDVGNIKTPTALLLMALMFIGGAPLSAAGGIKVTTFVIAT 
lAIFNTIRKEKNNSIFNREISERYIQLSFVTILISIAFIGMVTFILTIINSNIPLIKILF 
EWSAFGTVGLTMDLTSEYYNWTEFIIIIVMLCGKIGLLNISRALVPPKDPKNYRYTKGH 

45 IHL* 

Sequence 3235 

Cont ig_07 3 8_pos_l 2 59_2 4 

>sp:sp|034863 |UVRA_BACSU EXCINUCLEASE ABC SUBUNIT A. >gp : gp 
50 |Z99122|BSUB0019_13 Bacillus subtilis complete genome (secti 
on 19 of 21): from 3597091 to 3809700. NID: g2636029 . >gp : gp 
|AF017113 |AF017113_13 Bacillus subtilis 300-304 degree genom 
ic sequence. NID: g2 6 18 830. 

atgcgtgatttaggtaatacacttattgtcgttgaacatgacgatgatactatgagagca 
55 gctgattatttagttgatgtgggtccgggagctggtaaccacggtggagaggttgtctca 
agtggtacccctaataaagtaatgaaagataaaaaatccttaactggtcaatatttaagt 
ggaaaaaaacgaattgaagtccctgaatacagacgagaaatcaccgatagaaagattcaa 
attaaaggtgctaaaagtaataatttgaaaaatgtaaatgtagacttcccactatctgtc 
ttaactgttgttacaggtgtgtcaggttctggtaaaagttcactcgtcaatgaaatttta 
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tataaagcattagctcaaaaaattaataaatctaaagtgaaacctgggaattttgatgaa 
attaaaggaattgatcaattagataaaatcattgatattgatcaatcgccaataggtaga 
acaccacgttctaacccagccacatacactggtgtctttgatgacataagagatgtcttt 
gcacaaacgaatgaagctaaaatacgaggttatcaaaaaggtagatttagttttaatgtc 
5 aaaggtggacgatgtgaagcttgtaaaggtgatggaattataaaaattgaaatgcatttt 
ttaccagatgtctatgtaccttgtgaagtatgtgatggtaaacgctataatcgtgagact 
ttagaggtaacatacaaaggtaaaaatattgcggatgtattagaaatgactgttgaagaa 
gctacgcatttctttgaaaatattcctaagattaaacgtaaattacaaacacttgtagat 
gttgggttggggtacattactttaggtcaacaagctactacattatctggtggcgaagcg 

10 caacgtgtaaaactcgcatcagaattgcacaaacgttcaacggggcgttctatttatatt 
ctcgatgaaccaactacaggattacatgtcgacgatataagtcgtttattaaaggtattg 
aatcgtatagtggaaaatggtgatacggtcgttattatcgaacacaatcttgatgttatt 
aaaacggctgatcatattattgatttaggtccagaaggcggtgaaggtggaggaacaatc 
atcgcaactggtacacctgaagagattgctcaaaataaagggtcttacactggtcaatac 

15 ttaaaaccagtattagagagagacagcgttgaatag 

Sequence 3236 

MRDLGNTLIWEHDDDTMRAADYLVDVGPGAGNHGGEWSSGTPNKVMKDKKSLTGQYLS 
GKKRIEVPEYRREITDRKIQIKGAKSNNLKIWNVDFPLSVLTVVTGVSGSGKSSLVNEIL 
20 YKALAQKINKSKVKPGNFDEIKGIDQLDKIIDIDQSPIGRTPRSNPATYTGVFDDIRDVF 
AQTNEAKIRGYQKGRFSFNVKGGRCEACKGDGIIKIEMHFLPDVYVPCEVCDGKRYNRET 
LEVTYKGKNIADVLEMTVEEATHFFENIPKIKRKLQTLVDVGLGYITLGQQATTLSGGEA 
QRVKLASELHKRSTGRSIYILDEPTTGLHVDDISRLLKVLNRIVENGDTVVIIEHNLDVr 
KTADHI IDLGPEGGEGGGTI I ATGTPEEI AQNKGSYTGQYLKPVLERDSVE* 

25 

Sequence 3237 
Contig_0741_pos_5345_632 5 
is similar to (with p-value l.Oe-21) 
>sp:sp|Pl4940 |ADH_ALCEU ALCOHOL DEHYDROGENASE (EC 1.1.1.1). 

30 >pir :pir |A30196 I A30196 alcohol dehydrogenase (EC 1.1.1.1) - 
Alcaligenes eutrophus >gp:gp| J03362 | AFAADH_1 A.eutrophus al 
cohol dehydrogenase (ADH) gene, complete cds. NID: gl41899. 
atgtttaaaaagattgctactataataggttcgacattatttggtacagttttattcgca 
aaagtgaaagaaaagcgtagttataaaagttttttacaagagaaaatgattagaatatca 

35 ggaatgaaaaagacatttgaaagtatagatgacgcgaaaaaagctttgaatgagactaaa 
tatcaaactgcaggtaaatataatggaacaacatatgaatttaagcataaagttcaaata 
agagattattatggttctttagtctatgttgttaatgatcatggtcttccagatcaacgc 
acggtcttatatgtacatggaggcgcatggttccaagatcctttggaaaatcattttgaa 
tatttagacttactcgttgatgcgctcgatgctagggtgattatgcccgtatatcctaaa 

40 ataccacacagagattatcgtacgacatttgaattattaacaaaaatatataagcgatta 
ttgactaaaattgatgaacctgaaaacttggtcatcattggagattcagccgggggacaa 
attgcattagcttttgcacaaatgttaaaaaaagagcaactcagtcaacctggccatatt 
gttcttatttcaccggtgcttgatgcgacatttaagaatccagaagcaagaaaatatgaa 
aaagaagatccaatgcttggaattgaaggcagtaaatatcttgtagagttatgggctggt 

45 gatgcaccactagatgactataagatgtctccaatgaatggtgatttagaaggcctagga 
catattacacttactgtaggaaccaaagaaacattatatcctgatgcagttaagttctct 
cacatgttaaatgataaaggaataaagcatcagtttatcccaggttacaatttatttcat 
atttatcccttattccctatcccagagcgtcaacgctttttagaacagcttaaaaaaatc 
attgtcacaaaagagttataa 

50 

Sequence 3238 

MFKKIATIIGSTLFGTVLFAKVKEKRSYKSFLQEKMIRISGMKKTFESIDDAKKALNETK 
YQTAGKYNGTTYEFKHKVQIRDYYGSLVYVVNDHGLPDQRTVLYVHGGAWFQDPLENHFE 
YLDLLVDTUjDARVIMPVYPKIPHRDYRTTFELLTKIYKRLLTKIDEPENLVIIGDSAGGQ 
55 IALAFAQMLKKEQLSQPGHIVLISPVLDATFKNPEARKYEKEDPMLGIEGSKYLVELWAG 
DAPLDDYKMSPMNGDLEGLGHITLTVGTKETLYPDAVKFSHMLNDKGIKHQFIPGYNLFH 
I YPLF P I PERQRFLEQLKKI I VTKEL * 

Sequence 3239 
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Cont ig_07 4 l_pos_3 2 4 4_2 630 

is similar to (with p-value 2.0e-21) 

>gp:gp|L38252 |ACCEST_2 Acinetobacter Iwoffii orfl and ester 
ase (est) genes, complete cds . NID: gl209221. 
5 atgttatcagatatacttccaacaggttatgaaattggtgttttaaaaggtaaagttaaa 
cctggctgtacagtagccattgtaggtgctggtcctgtaggtttagcagcattacttaca 
gcacaattctattcaccttcaaaaattattatgattgatttagatgataatagattagaa 
accgctaaagaactaggtgctacgcatttaattaactctaaagagactgaaaccgcaatt 
aaaaaggtaaaatcgttaaatccacgtggtgttgatgttgctattgaagctgtcggaatt 

10 ccacaaacctttgatttatgtcaaaatttaattggtgtcgatggtacgattgctaatgtt 
ggtgtgcatgggttacctgtacaacttgatatagataaattatggattaaaaatattaac 
gtaactactggtttagtttcaggaaatacaactgaagaattacttgaagcgttaaaaagc 
aaaataatacaaccagaacaactcgtgacacattatagtaaactgagtgaaatcgaaagt 
gcctatgatttatttagaaatgcaacagatcataaagcgattaaattaatcatagagaat 

15 gatatcacaatttaa 

Sequence 3240 • 

MLSDILPTGYEIGVLKGKVKPGCTVAIVGAGPVGLAALLTAQFYSPSKIIMIDLDDNRLE 
TAKELGATHLINSKETETAIKKVKSLNPRGVDVAIEAVGIPQTFDLCQNLIGVDGTIANV 
20 GVHGLPVQLDIDKLWIKNINVTTGLVSGNTTEELLEALKSKIIQPEQLVTHYSKLSEIES 
AYDLFRNATDHKAIKLI I ENDITI * 

Sequence 3241 
Contig_0743_pos_0_688 

25 is similar to (with p-value l.Oe-50) 

>sp:sp|P70814 |RIBG_BACAM RIBOFLAVIN-SPECIFIC DEAMINASE (EC 
3.5.4.-). >gp:gp|X95955 |BARIBGENS_1 B . amyloliquef aciens ribB 
, ribG, ribA, ribH & ribT genes. NID: gl592687. 
atggatgatgctattcaactagcaaaaatggtaaatggacaaacaggtgttaatccacca 

30 gtaggatccgttgttgttaaaaacggtaggattgtaggtttaggtgcacatttaaaaaag 
ggagataaacatgccgaagtacaagctattgaaatggcaggtttaaatacccaaggtgct 
accatatacgtttcattagaaccttgcacacaccatggttcaacaccaccttgtgtgcat 
aaaatcattgaagcgggcata tctaaggtcatctatgctgttaaagatactactttagta 
agtaagggtgacgagattctgagagaagctggtatagaggttgaatttcaatataatgaa 

35 aatgcagctgcattataccgtgacttttttactgctaaaagaaacgaagttccagaagta 
actgtaaaggtctcatctagtctagatggtaaacaagcaacagactttaatgaaagtaag 
tggataacaaacaaagaagttaaagaagatgtttatcaattaagacatgagcatgatgca 
gttattactgggcgtagaaccattgaagcagacaatccattgtatacaaccagggttcct 
gatggaaagcatccgattcgagttattctttctaagaaaggtcaactcgattttaatcaa 

40 caaatatttaaagatactgcatcCCCTG 

Sequence 3242 

MDDAIQLAKMVNGQTGVNPPVGSVWKNGRIVGLGAHLKKGDKHAEVQAIEMAGLNTQGA 
Tl YVSLEPCTHHGSTPPCVHKI I EAGI SKVI YAVKDTTLVSKGDEILREAGIEVEFQYNE 
45 NAAALYRDFFTAKRNEVPEVTVKVSSSLDGKQATDFNESKWITNKEVKEDVYQLRHEHDA 
VITGRRTIEADNPLYTTRVPDGKHPIRVILSKKGQLDFNQQIFKDTASPX 

Sequence 3243 

Con t ig_07 4 4_pos_2 4 4 5_3 218 

50 is similar to (with p-value 5.0e-69) 

>gp:gp|O96107 |SCU96107_3 Staphylococcus carnosus N5,N10-met 
hylenetetrahydromethanopterin reductase homo log, SceB precur 
sor (sceB) and putative transmembrane protein genes, coroplet 
e cds, and putative Na+/H+ antiporter NhaC (nhaC) gene, part 

55 ial cds. NID: g2735503. 

atgaaaaaaatcgctacagctacaattgcaactgcaggaatcgctactttcgcatttgca 
caccatgacgcacaagcagcagaacaaaataatgatgggtacaatccaaacgacccttat 
tcatatagctacacttacacaatcgatgctgaaggtaactaccactacacttggaaaggt 
aactggagtccagatcgtgtaaatacttcatataactataataattataataactacaac 
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tactatggttacaataactatagcaactacaataactacagtaattacaacaattacaac 
aactatcaatcaaacaacacgcaatcacaaagaacaactcaaccgactggtggtttaggc 
gcaagctattcaacatcaagtagtaatgttcacgttacaacaacttctgcgccatcatca 
aacggtgtatctttatcaaacgctcgctcagcatctggtaacttatacacttcaggtcaa 
5 tgtacatattatgtatttgacagagtaggtggcaaaatcggttcaacgtggggtaacgca 
aacaactgggcaaacgctgcagcacgttctggttacacagtaaacaattcgcctgctaaa 
ggtgcaatcttacaaacgtcacaaggtgcatacggacacgtagcatacgttgaaggtgta 
aacagcaatggttcaatcagagtttcagaaatgaactacggtcacggtgcaggtgttgtc 
acttcacgtacaatctctgcgagccaagctgcttcatataactatattcactaa 

10 

Sequence 3244 

MKKIATATIATAGIATFAFAHHDAQAAEQNNDGYNPNDPYSYSYTYTIDAEXSNYHYTWKG 
NWSPDRWTSYNYNNYNNYNYYGY1««'SNY1^SNYNNY^ 

ASYSTSSSNVHVTTTSAPSSNGVSLSNARSASGNLYTSGQCTYYVFDRVGGKIGSTWGNA 
15 NNWANAAARSGYTVNNSPAKGAILQTSQGAYGHVAYVEGVNSNGSIRVSEMNYGHGAGW 
TSRTISASQAASYNYIH* 

Sequence 3245 

Con t i g_0 7 4 4_pos_7 4 02_7 740 

20 is similar to (with p-value 9.0e-30) 

>sp:sp|P37279|ATCS_SYNP7 CATION-TRANSPORTING ATPASE PACS (E 
C 3.6.1.-). >pir:pir|S36741 |S36741 cation-transporting ATPas 
e pacS - Synechococcus sp. >gp : gp | D16437 | SY0PACS_1 Synechoco 
ecus sp. DNA for PacS, complete cds . NID: g435124. 

25 atggtagaaccaatcactgaatcgccactacttttttcaacaggtatagactcaccagtt 
agcatggattcgtcaatagaagtatcacctttagtgactttgccatctacaggtatcttt 
tcgccgggttttattagtaaagtatctccgactttaactttatcaagtggaagcataatt 
tctttattttctttaattactcgtgcttctttcgcttgtaaatttaacaattcgcttaat 
gcattggtagtctgtgatttggcacgtgcttctaaatatttaccaagaagaattaacgta 

30 attaaaatagcacttgtttcaaaatataaatgcgggtga 

Sequence 3245 

MVEPITESPLLFSTGIDSPVSMDSSIEVSPLVTLPSTGIFSPGFISKVSPTLTIiSSGSII 
SLFSLITRASFACKFNNSLNALWCDLARASKYLPRRINVIKIALVSKYKCG* 

35 

Sequence 3247 
Con t ig_0 7 4 4_pos_9 912_8887 
is similar to (with p-value 3.0e-66) 
>sp:sp| P56157|DP3A_HELPY DNA POLYMERASE III, ALPHA CHAIN (E 

40 C 2.7.7.7). >gp:gp|AE000646 |HPAE000646_3 Helicobacter pylori 
section 124 of 13 4 of the complete genome. NID: g231463 5. 
atgggatttgatgaaacgagtttaaatgagatttcaaaacttattccacataaattaggt 
ataactcttgaagaagcataccaaaagccagagtttaaagcatttgttcatcgtaatcat 
agaaatgaacgttggtttgaagtgagtaaaaagttagagggattaccaagacatacgtct 

45 acgcatgctgcaggtatcattatcaatgatcaaccattattcaaatttgccccattaaca 
actggtgatacaggattattaacgcagtggactatgacagaagcggaacgtataggatta 
ttaaaaattgatttcttgggattacgcaatctatcaattat teat caaat tat tttacaa 
gttaaaaaggatttaaatataaatattgatatagaagctataccttatgatgataaaaaa 
gtttttgatttattatcaaacggtgacactacaggtatatttcaattggaatcagacggt 

50 gttagaagcgtattaaaaagattgcaacccgaacattttgaagatatcgtagctgtcaca 
tcattatatagaccaggaccaatggaagaaataccaacttatataacccgtagacataat 
cc taaccaatttgcttatttacatccagatttagaaccaatcttaaaaaacacatatggt 
gttatcatttatcaagaacaaataatgctaatagcaagtcaagttgctggttttagttat 
ggtgaagcagatattttaagaagggcaatgagtaaaaagaatcgtgcaatcttagaaagt 

55 gagcgtcaacatttcattgatggtgcaaaaaataacggttacgatgaacagataagtaag 
caaatttttgatttaatacttaagtttgcagattatgggttcccacgtgcccatgctgtt 
agttactcaaaaattgcatacattatgagctatttaaaagtgccacaaatcagtacagaa 
ccgttatgcctggcttcaaatatgaagcaccaccaaatcaaaataaaatcaatccttatg 
aattaa 
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Sequence 3248 

MGFDETSLNEISKLIPHKLGITLEEAYQKPEFKAFVHRNHRNERWFEVSKKLEGLPRHTS 
THAAGIIINDQPLFKFAPLTTGDTGLLTQWTMTEAERIGLLKIDFLGLRNLSIIHQIILQ 
5 VKKDLNINIDIEAIPYDDKKVFDLLSNGDTTGIFQLESDGVRSVLKRLQPEHFEDIVAVT 
SLYRPGPMEEIPTYITRRHNPNQFAYLHPDLEPILKNTYGVIIYQEQIMLIASQVAGFSY 
GEADILRRAMSKKNRAILESERQHFIDGAKNNGYDEQISKQIFDLILKFADYGFPRAHAV 
SYSKIAYIMSYLKVPQISTEPLCLASNMKHHQIKIKSILMN* 

10 Sequence 3249 

Contig_0746_pos_2558_3991 

is similar to (with p-value 2.0e-45) 
>sp:sp|P26648|SUFI_ECOLI SUFI PROTEIN PRECURSOR, >gp:gp|U28 

377|ECU28377_116 Escherichia coli K-12 genome; approximately 
15 65 to 68 minutes. NID: g882431. >gp :gp | AE000384 | AE000384_1 

Escherichia coli K-12 MG1655 section 274 of 400 of the compl 

ete genome. NID: g2367186. 

atgtataataaagtttttgcaattttaattataattttttccataataattattgcgtct 
aatgatactttcgcagaaagtaagaatgatatgatgaatatgaaagaagataagaaaaat 

20 acaatggatatgacaaatatgaaacatcatgacgaaagaaagaaattaaattcttcacaa 
ggaaaaaatgaaataatatttcctaaagttgcagagtcaaaaaaagataacaatggttat 
aaaaattatacattaaaagctcaggaaggaaagacagagttttacaaaaataatttttct 
aatactctaggctacaatggaaatttacttggaccaactttaaaattaaaaaaaggagat 
aaagttaaaattaagttaataaataacttagatgaaaatacaacatttcattggcatgga 

25 ttagaagtaaatggaaaagtggatggagggccttctcaggttataaaaccaggaaaagaa 
aaaactataaaatttgaggttaatcaagattctgctacgttatggtatcacccccacccc 
tctccaaatacagctaaacaagtttataatggcttatcaggattattatatatagaagat 
agtaaaaagaataattatcctagtgattatggaaaaaatgatttgcctataataatccaa 
gataaaacatttgtatctaaaaaattaaattattcaaaaacgaaagacgaagatggcact 

30 caaggtgatactgttcttgtgaacggaatagtaaaccccaaactgacaacaaaagaagag 
aaaatacgtttgagacttttaaatggttctaatgctcgagatttaaatcttaagctaagt 
aataatcaaagttttgagtatattgcttcagatggcggtcaattaaaaaacgctaaaaaa 
ttaaaagaaattaatttagctccttcagaaagaaaagaaatagtaatagatttatctaaa 
atgaaaggcgagaaaatcagtctggttgataatgataaaactgtaattttaccgattagt 

35 aacaaagagaaaagttctaacaaaggtaatacaccaaaagtaagtaaaaaaataaaatta 
gaaggtatgaatgataatgttaccattaatggtaataaattcgatcctaaaagaatagat 
tttacacaaaagttaaaccagaaagaagtatgggaaattgaaaacgtcaaagataaaatg 
ggtggtatgaaacatcctttccacatccatggaacgcaatttaaagttttatctgtggat 
ggggagaaaccaccaaaagatatgaggggtaaaaaagatgttatatctttggaacctgga 

40 caaaaagctaaaatagaggttgtatttaaaaatactggaacatacatgtttcactgtcat 
atacttgagcatgaagataatggaatgatgggtcaaataaaagtaacaaactaa 

Sequence 3250 

MYNKVFAILIIIFSIIIIASNOTFAESKNDMMl^KEDKKNTMDMTNM}aiHDERKKLNSSQ 
45 GKNEIIFPKVAESKKDNNGYKNYTLKAQEGKTEFYKNNFSNTLGYNGNLLGPTLKLKKGD 
KVKIKLINNLDENTTFHWHGLEVNGKVDGGPSQVIKPGKEKTIKFEVNQDSATLWYHPHP 
SPNTAKQVYNGLSGLLYIEDSKKNNYPSDYGKNDLPIIIQDKTFVSKKLNYSKTKDEDGT 
QGDTVLVNGIVNPKLTTKEEKIRLRLLNGSNARDLNLKLSNNQSFEYIASDGGQLKNAKK 
LKEINLAPSERKEIVIDLSKMKGEKISLVDNDKTVILPISNKEKSSNKGNTPKVSKKIKL 
50 EGMNDNVTINGNKFDPKRIDFTQKLNQKEVWEIENVKDKMGGMKHPFHIHGTQFKVLSVD 
GEKPPKDMRGKKDVISLEPGQKAKIEWFKNTGTYMFHCHILEHEDNGMMGQIKVTN* 

Sequence 3251 
Contig_0748_pos_33 58_4230 
55 is similar to (with p-value 1.0e-3B) 

>sp:sp|P44948|FPG_HAEIN FORMAMIDOPYRIMIDINE-DNA GLYCOSYLASE 
(EC 3.2.2.23) (FAPY-DNA GLYCOSYLASE) . >pir :pir | A64104 | A6410 
4 formamidopyrimidine-DNA glycosylase (fpg) hotnolog - Haemop 
hilus influenzae {strain Rd KW20) >gp:gp |U32776 |U32776_1 Hae 
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mophilus influenzae Rd section 91 of 163 of the complete gen 
ome. NID: gl573969. 

atgcctgaactacctgaagttgaacatgttaaaagaggtattgagccatttataaaaagt 
gcaaaaatagagaaagtaacttttgctaaaaatgtaattaacggtaagaataataaccgt 
5 gagacgattataaaaggtatggaattagatacttttaaaaaacttactgaaggttatgtt 
ataaaaaaagttgaaagaagaagtaagtacattattttttatatagcggatcatgacgat 
gatagaatcttagttagtcatttaggtatggcaggcggattctttgttgttaataacctt 
gatgagataagtacaccgaattatcgaaagcattggcaagtcattttcgatttggataat 
aaacaaaaattagtctattctgatatcagacggtttggagaaattagaaatatagtcaat 

10 tttgatagttatccatctttattagaaatcgctccagaaccatttgaagaggtggcattt 
gaacactatttagaatgtttgacaatgaaaaaatataagaataaaccaataaaacaaacg 
attcttgatcatcgtgttatagcaggagctggaaatatctatgcctgtgaagctttattc 
agagctggtattactccggataaaattactaattcactcactaaacaagaaagaaaatcc 
ctcttttattatgttcgagaagttttagaagagggtataaaatatggaggtactagtatt 

15 tcagattataggcatgcagatggtaaaactggacaaatgcaattacatttaaatgtatat 
aaacaaaaaaag t gcaagg 1 1 tgtgg t ca t tcga t tgaaacaaaagtga tagc tggtaga 
aa tag teat t tttgcccaaac tgtcagagataa 

Sequence 3 252 

20 MPELPEVEHVKRG I EPF I KSAKI EKVTFAKNV I NGKNNNRETI I KGMELDTFKKLTEGYV 
IKKVERRSKYIIFYIADHDDDRILVSHLGMAGGFFWNNLDEISTPNYRKHWQVIFDLDN 
KQKLVYSDIRRFGEIRNIVNFDSYPSLLEIAPEPFEEVAFEHYLECLTMKKYKNKPIKQT 
ILDHRVIAGAGNIYACEALFRAGITPDKITNSLTKQERKSLFYYVREVLEEGIKYGGTSI 
SDYRHADGKTGQMQLHLNVYKQKKCKVCGHSIETKVIAGRNSHFCPNCQR* 

25 

Sequence 3253 
Contig_0748 _pos_4883_5752 
is similar to (with p-value 4.0e-89) 
>gP'gp|Z82038 |CTZ82038_4 C . thermosaccharolyticum etfB, etfA 
30 , hbd, thlA and actA genes. NID: gl667352. >gp : gp | Z9297 4 | TTB 
CS0PRN_6 T . thermosaccharolyticum BCS operon DNA. NID: gl9033 
26. 

gtgtttggtggtgtatttaaggatatacctgcctatgaactaggtgcaacagttattcgt 
caaattttagaacatagtcaaatagatcctaatgaaatcaatgaagttattctaggaaac 

35 gtattacaggcaggtcaaggacaaaatcctgctcgtattgctgcgattcatggtggtgtg 
ccagaagcggtaccttcttttactgtaaataaagtttgcggttctggattaaaagcgatt 
caacttgcctatcaatctattgtagcgggagataatgagattgttatcgctggaggcatg 
gaaagtatgtctcaatctccaatgcttcttaaaaatagtcgtttcggttttaaaatggga 
aatcaaactttagaagatagtatgatagctgatggtttaactgataagtttaatgattac 

40 catatgggtatcacagccgaaaatctagttgaacagtatcagattagtcgtaaagaacaa 
gatcaatttgcattcgattctcaacaaaaagcatcacgtgcacaacaagctggtgtattt 
gatgctgaaattgtacctgtagaggtaccacaacgtaaaggcgaccccctaattatttct 
caagatgaaggcattagacctcaaacgacaattgataagttagcacaactccgtccagca 
tttaaaaaagatggatcagtaactgctggtaatgcatccggtatcaatgacggtgctgct 

45 gctatgctcgttatgacggaggacaaagcgaaagcattgggcttacaacctatagctgta 
ttagatagttttggtgcgagtggtgtggcgccttcaattatgggtattcgacgcacaaaa 
acaaaccaaagaagaaatagcaac tgt t aa 

Sequence 3254 

50 VFGGVFKDIPAYELGATVIRQILEHSQIDPNEINEVILGNVLQAGQGQNPARIAAIHGGV 
PEAVPSFTVNKVCGSGLKAIQLAYQSIVAGDNEIVIAGGMESMSQSPMLLKNSRFGFKMG 
NQTLEDSMIADGLTDKFNDYHMGITAENLVEQYQISRKEQDQFAFDSQQKASRAQQAGVF 
DAEI VPVEVPQRKGDPLI I SQDEG I RPQTTIDKLAQLRPAFKKDGSVTAGNASGINDGAA 
AMLVMTEDKAKALGLQPIAVLDSFGASGVAPSIMGIRRTKTNQRRNSNC* 

55 

Sequence 3255 
Contig_0753_pos_1053_58 
is similar to (with p-value 3.0e-40) 
>gp:gp|AB011003 |AB011003_1 Candida albicans CaUAPl gene for 
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UDP-N-acetylglucosamine pyrophosphorylase, complete cds . NI 
D: g3273313. 

atgttagacaaaaatcaattggaaaagtataaccaagagcatttgtatgaatatgaaaaa 
ttaatgagtagtaatgaaaagaatgctttagatgaaaaagtagatcagttaaatcttgca 
5 gaaattcaagatttatatcaagatttatatgttaatagaaaaactattgatgatgtatct 
tctgtatctgaagtcaaatatgaagtgaaatcacgactcaatgaagaagaacgacataca 
tatgaacaaaaaggttatgaggcaatacgaaatggtgaatttgctgtattattgatggct 
ggaggacaaggtacgcgtttaggatataaagggcctaaaggttcttttgaaatagagggt 
acgagtttatttgaacttcaggcgcgtcaactgattcgtttaaaagaagaaaccggccac 

10 acaattaattggtatattatgacaagtgacattaatcataaagatacaatagagtatttt 
aaacaacataaatattttaactatgatgccaatcatattcatttctttaagcaagataac 
attgttgctttaagtgaagaaggaaagcttgttttaaatagagatggacatataatggaa 
acacctaatggtaatgggggtgtattcaagtctcttaagaaagcaggataccttgataag 
atgcaacaagatcacgtcaaatatatcttcttaaataacattgataatgtcttagttaaa 

15 gttttagacccgttatttgccggttttacagtgacacaaagtaaagacatcacatcaaaa 
acaattcaacctaaagatagtgaaagtgtaggtcggcttgtaaatgttgattgtaaagac 
actgtgttagagtattctgaattaatgtcatatttccaggagctgcactccagttcattt 
ctgaaactaaaatacttccatcgctattcacgcgctctacaaacgctacgtgaccatagt 
aaccagcgtcagtttgtgcaattgagcctactgtag 

20 

Sequence 3256 

MLDKNQLEKYNQEHLYEYEKLMSSNEKISIALDEKVDQLNLAEIQDLYQDLYVNRKTIDDVS 
SVSEVKYEVKSRLNEEERHTYEQKGYEAIRNGEFAVLLMAGGQGTRLGYKGPKGSFEIEG 
TSLFELQARQLIRLKEETGHTINWYIMTSDINHKDTIEYFKQHKYFNYDANHIHFFKQDN 
25 IVALSEEGKLVLNRDGHIMETPNGNGGVFKSLKKAGYLDKMQQDHVKYIFLNNIDNVLVK 
VLDPLFAGFTVTQSKDITSKTIQPKDSESVGRLVNVDCKDTVLEYSELMSYPQELHSSSF 
LKLKYFHRYSRALQTLRDHSNQRQFVQLSLL* 

Sequence 3257 

30 Con t i g_0 7 5 4_pos_7 8 6 6_ 0 

is similar to {with p-value 4.0e-44) 

>sp:sp|P36649|YACK_ECOLI PROBABLE 53.4 KD BLUE-COPPER PROTE 
IN YACQ PEIECURSOR. >gp :gp | AE000121 1 AE000121_8 Escherichia co 
li K-12 MG1655 section 11 of 400 of the complete genome. NID 

35 : gl786306. 

atgtataataaagtttttgcaattttaattataattttttccataataattattgcgtct 
aatgatactttcgcagaaagtaagaatgatatgatgaatatgaaagaagataagaaaaat 
acaatggatatgacaaatatgaaacatcatgacgaaagaaagaaattaaattcttcacaa 
ggaaaaaatgaaataatatttcctaaagttgcagagtcaaaaaaagataacaatggttat 

40 aaaaattatacattaaaagctcaggaaggaaagacagagttttacaaaaataatttttct 
aatactctaggctacaatggaaatttacttggaccaactttaaaattaaaaaaaggagat 
aaagttaaaattaagttaataaataacttagatgaaaatacaacatttcattggcatgga 
ttagaagtaaatggaaaagtggatggagggccttctcaggttataaaaccaggaaaagaa 
aaaactataaaatttgaggttaatcaagattctgctacgttatggtatcacccccacccc 

45 tctccaaatacagctaaacaagtttataatggcttatcaggattattatatatagaagat 
agtaaaaagaataattatcctagtgattatggaaaaaatgatttgcctataataatccaa 
gataaaacatttgtatctaaaaaattaaattaittcaaaaacgaaagacgaagatggcact 
caaggtgatactgttcttgtgaacggaatagtaaaccccaaactgacaacaaaagaagag 
aaaatacgtttgagacttttaaatggttctaatgctcgagatttaaatcttaagctaagt 

50 aataatcaaagttttgagtatattgcttcagatggcggtcaattaaaaaacgctaaaaaa 
ttaaaagaaattaatttagctccttcagaaagaaaagaaatagtaatagatttatctaaa 
atgaaaggcgagaaaatcagtctggttgataatgataaaactgtaattttaccgattagt 
aacaaagagaaaagttctaacaaaggtaatacaccaaaagtaagtaaaaaaataaaatta 
gaaggtatgaatgataatgttaccattaatggtaataaattcgatcctaaaagaatagat 

55 tttacacaaaagttaaaccagaaagaagtatgggaaattgaaaacgtcaaagataaaatg 
ggtggtatgaaacatcctttccacatccat 

Sequence 3258 

MYNKVFAI LI IIFSI I H ASNDTFAESKNDMMNMKEDKKNTMDMTNMKHHDERKKLNSSQ 
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GKNEIIFPKVAESKKDNNGYKNYTLKAQEGKTEFYKNNFSNTLGYNGNLLGPTLKLKKGD 
KVKIKLINNLDENTTFHWHGLEVNGKVDGGPSQVIKPGKEKTIKFEVNQDSATLWYHPHP 
SPNTAKQVYNGLSGLLYIEDSKKNNYPSDYGKNDLPIIIQDKTFVSKKLNYSKTKDEDGT 
QGDTVliVNGIVNPKIiTTKEEKIRLRLLNGSNARDLNLKLSNNQSFEYIASDGGQLKNAKK 
5 LKEINLAPSERKEIVIDLSKMKGEKISLVDNDKTVILPISNKEKSSNKGNTPKVSKKIKL 
EGMNDNVTINGNKFDPKRIDFTQKLNQKEVWEIENVKDKMGGMKHPFHIH 

Sequence 3259 
Contig_0755_pos_2032_3 393 

10 is similar to (with p-value 8.0e-68) 

>gp:gp|L41217 |RICNRAMP_1 Oryza Sativa integral membrane pro 
tein (OsNramp) mRNA. NID: g2231131. >gp:gp | S81897 |S81897_1 O 
sNrampl=Nrampl homolog/Bcg product homolog [Oryza sativa, in 
dica, cv. IR 36, etiolated shoots, mRNA, 1967 ntl . NID: gl47 

15 0319. 

atgggggtgattttattgaattccaataataacaatcatgaacaacaacgaagtttagat 
gaaatcaataacaccataaacttcaatcataatgatagtgcaagtcaaaaatttctggct 
tttttaggaccgggattgcttgttgcagttggttacatggatcctggaaattggattaca 
tccatgcaaggaggagcacaatatggctataccttgttattcataatcttaatctctagc 

20 ttatctgctatgctgttacaaagtatgactgtgagattaggaatagcaactggtatggat 
ttagcacaaatgacacgtcattttttaaataagcctgtagcaattatgttctggattatt 
gcagaattagcaattatcgctactgatattgcagaagttataggtagcgctatcgcatta 
gatttaatcttcggcataccattaattgtaggcgcattaatcactgtatttgatgtattt 
ttattattattcatcatgaaatttggctttagaaagattgaagctatcgtgggaacgtta 

25 atctttaccgtattggccatttttgtatttgaagtttatatttcttctccacaaataaca 
gatatgcttaatggttttgtgcctcataaagaaattattacaaaccaagggatactttat 
attgcactaggtatcataggtgctactattatgccacataacttatatttacattcttct 
attgtacaatctcgaaaatatgatagacacagtattcatgaaaaagcacaagcgattaag 
tatgctactatcgactctaatatacagctatccatcgcttttgtagtcaattgcttatta 

30 cttacacttggtgcagcgctattttttggaactaaaactgaagatttgggtggtttttat 
gatctttatttggctctaaaaacagaacctgctttaggtgcaacgcttggcggtattatg 
agtactttatttgctgttgcccttttagcttctggtcaaaattcaactataacgggaacg 
ttagcaggccaaattgtgatggaaggatttcttaaattatccattccaaattggttacgt 
cgtcttatcactcggtctttagcagtgatacctgttatcatttgtcttatagtatttaaa 

35 ggaaatactgaaaaaattgaacaattacttgtcttttctcaagtgttcttgagtattgct 
ttgccattttcgttaataccgcttcaattagctacaagtaatcaaaatcttatgggtcct 
tttaagaataaaacatggattaacatcatttcttggttactcataattgtcttaagtgga 
cttaacgtatatcttatcattcaaacattccaagaattatga 

40 Sequence 3260 

MGVILLNSNimNHEQQRSLDEINNTINFNHNDSASQKFLAFLGPGLLVAVGYMDPGNWIT 
SMQGGAQYGYTLLFIILISSLSAMLLQSMTVRLGIATGMDLAQMTRHFLNKPVAIMFWII 
AELAIIATDIAEVIGSAIALDLIFGIPLIVGAIilTVFDVFLLLFIMKFGFRKIEAIVGTL 
IFTVLAIFVFEVYISSPQITDMLNGFVPHKEIITNQGILYIALGIIGATIMPHNLYLHSS 

45 IVQSRKYDRHSIHEKAQAIKYATIDSNIQLSIAFWNCLLLTLGAALFFGTKTEDLGGFY 
DLYLALKTEPALGATLGGIMSTLFAVALLASGQNSTITGTLAGQIVMEGFLKliSIPNWLR 
RLITRSLAVIPVIICLIVFKGNTEKIEQLLVFSQVFLSIALPPSLIPLQLATSNQNLMGP 
FKNKTWINI I S WLLII VLSG LNVYL I IQTFQEL * 

50 Sequence 3261 

Contig_0755^os_6203_5673 

No hits found 

atgggattaaataaattagggtatagtcttaatgacaaaaatccgacccatattcatcaa 
gcagaaaaagatttgcataatttagcacctcaagttagagggatagtaggcgacgaaatt 
55 actatgatgcttcaacaaaacgaaggacatgttgcagtagtttggagtggcgttgctgca 
ccacttgtacaggaaaatactcgttataattacgtgatacctaaagaaggctctaaccta 
tggtttgataatatggtgatacctaaaactgcacaaaataaagaaggtgcgtataagttt 
atgaatttcttactagacgcgcaaaatagtgcccagaatacggaatgggtaggatatgca 
acacctaataaagcagctcgaagtaagttgcctaaaaaggtaagaaatgattatagattt 
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tatccatcaaatcaagaacagcaacggttagaagtctataaagatttaggtcaaacgtct 
ctcagtgaatataatgaaagctttttaaattttaaaatgtctttaaaatag 

Sequence 3262 

5 MGLNKLGYSLNDKNPTHIHQAEKDLHNLAPQVRGIVGDEITMMLQQNEGHVAWWSGVAA 
PLVQENTRYNYVIPKEGSNLWFDNMVIPKTAQNKEGAYKFMNFLLDAQNSAQNTEWVGYA 
TPNKAARSKLPKKVRNDYRFYPSNQEQQRLEVYKDLGQTSLSEYNESFLNFKMSLK* 

Sequence 3263 

10 Contig_0755_pos_3989_3612 

is similar to (with p-value 8.0e-25) 

>sp:sp|P2386l|POTD_ECOLI SPERMIDINE/ PUTRESCINE-BINDING PERI 
PLASMIC PROTEIN PRECURSOR (SPBP) . >pir :pir | D40840 | D40840 spe 
rmidine/putrescine transport protein D - Escherichia coli >g 

15 p:gp|D90747 |d90747_2 Escherichia coli genomic DNA.(25.1 - 25 
.5 min) . NID: gl651548. >gp:gp| AE000212 | AE000212_9 Escherich 
ia coli K-12 MG1655 section 102 of 400 of the complete genom 
e. NID: gl787358. >gp : gp | M64519 | EC0P0TABCD_4 E.coli transpor 
t protein {potA, potB, potc and potD) genes, complete cds . N 

20 ID: gl47325. 

atgtcatctgaaatgtttttagtttctcttattgcgtcattagtcattttaggtttttca 
acgctattaggttttgttggtacaatggtaattgaaggtagaaaaaaccttgctgctagt 
ttactcatagcagcggctatcgtaggtttatttacgactaatttaatcgcaatggtttta 
tggatgattgctgcgattagactttttgcaaaaaaagataaaacagatgtaaatgaaaat 

25 gctacggctcaacttcgtcaaaaccattcaaagagccaaagtgattggaatcatcaacaa 
aaccaacaacagaaagatgcttgggatcctgaacaagaaatcaacaaacaaaaaaaggac 
gatccatatatatattaa 

Sequence 3264 

30 MSSEMFLVSLIASLVILGFSTLLGFVGTMVIEGRKNLAASLLIAAAIVGLFTTNLIAMVL 
WMIAAIRLFAKKDKTDVNENATAQLRQNHSKSQSDWNHQQNQQQKDAWDPEQEINKQKKD 
DPYIY* 

Sequence 3265 

35 Contig_0756„pos„5439_6113 

is similar to (with p-value 3.0e-39) 

>gp:gp|U67196 |TMU67196_1 Thermotoga maritima DNA~binding re 
sponse regulator (drrA) and histidine protein kinase (hp)cA) 
genes, complete cds, thymidne/pyrimidine phosphorylase homol 

40 og gene, partial cds. complete cds. NID: gl575576. 

atgattaattgcttaatcgtagacgatgataaaaagttattgcaatatgtttcaagtcat 
ttagaaagagaaagtattcaaacacatactttcacaagtggagaagcatcactagatttt 
cttgaaaataaaaatgttgatattgcgatagtagatattatgatgagtggaatggatggt 
tttgagctttgtcagactttgaaagatgattatcatattcctgtcataatgttaacagct 

45 agagatgcattaagtgataaagaacgtgcatttctaagtgggactgacgattatgtcact 
aaaccttttgaggttaaagaattattatttagaattaaagctgtcttaagacgatatcaa 
attaatgctgataacgagttacaacttggcaacttaatattaaatcagtcttacatggaa 
attactgtgggttcaaaaacgatgaatcttccaaacaaagaatttcagttgttattttta 
ttagcctctaatcctaaacatattttcactcgagatgatattattgaaaaaatttggggc 

50 ttcgattatgaaggagatgatcgtactgttgatgttcatattaaaagattacgtcaacgt 
ttatctaaattgaaatcatcagtatcaattcaaactgtaagaggtcaaggatatagggtg 
gaccaaaatgtttaa 

Sequence 3266 

55 MINCLIVDDDKKLLQYVSSHLERESIQTHTFTSGEASLDFLENKNVDIAIVDIMMSGMDG 
FELCQTLKDDYHIPVIMIiTARDALSDKERAFLSGTDDYVTKPFEVKELLFRIKAVLRRYQ 
INADNELQLGNLILNQSYMEITVGSKTMNLPNKEFQLLFLLASNPKHIFTRDDIIEKIWG 
FDYEGDDRTVDVHIKRLRQRLSKLKSSVSIQTVRGQGYRVDQNV* 
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Sequence 3267 
Contig_0757^os_1852_2325 
is similar to (with p-value l.Oe-26) 
>gp:gp| AJ000084 I PMAJ84_3 Proteus mirabilis ccm and pat gene 
5 s and partial ygbA gene. NID: g3395515. 

atgaatgctaatgcccaagtacgtagatacttccctagtt tact tag ttatcg teg ttct 
gagatagacatggaaaatatggatagcataatacaagagagtggcctaggtttatttgct 
gtagagttgaaagagacgggagaatggttagggtttataggtgtgaattacgtttcaaaa 
gatagccattacccttttaaagagttaccgttttatgaaataggttggaggctaattcca 
10 gaagtatggggaaatggtctagctacagaaggggcagaagccgtaatgaaatatgctaga 
gataaaggaattaaagaattatatagttttacttctgaaaataatttgccttctagaaaa 
gtcatggaaaaattaggaatgacttttttagacaattttgaatatccgaatcttagtaaa 
taccatccccttaaacgtcatgtaaga tat tataaagagc tact tec ttcttga 

15 Sequence 3268 

MNANAQVRRYFPSLLSYRRSEIDMEa^SIIQESGLGLFAVELKETGEWLGFIGVNYVSK 
DSHYPFKELPFYEIGWRLIPEVWGNGLATEGAEAVMKYARDKGIKELYSFTSENNLPSRK 
VMEKLGMTFLDNFEYPNLSKYHPLKRHVRYYKELLPS* 



20 Sequence 3269 

Con t ig_0 7 5 7_pos_4 0 2 1_3 656 

is similar to (with p-value 2.0e-29) 
>sp:sp|P46378|FAS6_RHOPA HYPOTHETICAL 21.1 KD PROTEIN IN FA 

SCIATION LOCUS (0RF6) . >pir :pir | F55578 | F55578 hypothetical p 
25 rotein 2 (ipt 3' region) - Rhodococcus fascians plasmid pFiD 

188 >gp:gp| Z2963 5 |RFCCIPTFD_6 R. fascians (D188) genes for P4 

50 cytochrome, isopentenyl transferase and ferredoxine. NID: 

g455000. 

atggacaaaaagcttcaaaagaatattgagaaacgtcataaagaagaacaaaaacaacgt 
30 gaagctaatcagaaacaacgtattaaagacatgaaaaaaactcaaaaatacgaagagcaa 
gttggcttaactcctggtaaaatagatcacgaaattgagaaaaaaggcgaaaaactagaa 
aaagataatcgtaaagatattaaaaaattagataaaaagcttcaaaagaatattgaaaaa 
cgtcataaagaagaacaaaaacagcgtgaagaagcagagaaagctagaaaaaaagaattt 
aaaaaatatgaaaattacgtggctgacagtgtcgtaaaacaacataaggaatcaaatcat 
35 tcttaa 

Sequence 3270 

MDKKLQKNIEKRHKEEQKQREANQKQRIKDMKKTQKYEEQVGLTPGKIDHEIEKKGEKLE 
KDNRKDIKKLDKKLQKNIEKRHKEEQKQREEAEKARKKEFKKYENYVADSWKQHKESNH 
40 S* 



Sequence 3271 
Con t i g_0 7 5 7_pos_3 1 9 3_2 627 
No hits found 

45 atgaaaagaattgctgtttattgtggtgcaagtaaagggaaaaacccatcttatgttaaa 
gaggcatacgaattaggaaaatatatggctgaacaaggatacgagcttgtattcggagca 
ggatcagtcggcattatgggagctattcaagatggcatacttgagcatggcggtaaagct 
atcggtgtcatgcctaaaatgttagatgaacgagaaataacaagccaaaaagtaagtgag 
cttatattagtagattctatgcatgaacgaaaaaataaaatgactgaacttgccgatgct 

50 tttattatggctccaggcggtgctggt teat tagaagaattttttgaaatgtatagt egg 
gctcaaattggtatacaccaaaagcctataggtgtatttaatttaaatggattctttgag 
ccactacaacacttaatcgaccatatgattaaagaaggatttattgatgagaaatatcaa 
aagcttgcacctttatatgatactaaagaatcactcatcgaaggacttaaacattacaaa 
ccacttggtgtacgtacatacgattaa 

55 

Sequence 3272 

MKRIAVYCGASKGKNPSYVKEAYELGKYMAEQGYELVFGAGSVGIMGAIQDGILEHGGKA 
IGVMPKMLDEREITSQKVSELILVDSMHERKNKMTELADAFIMAPGGAGSLEEFFEMYSW 
AQIGIHQKPIGVFNLNGFFEPLQHLIDHMIKEGFIDEKYQKLAPLYDTKESLIEGLKHYK 
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PLGVRTYD* 

Sequence 3273 
Con t i g_0 7 5 9_pos_3 7 7 3 _3 0 0 6 
5 is similar to (with p-value 9.0e-58) 

>gp:gp|U3 8892 |SSU38892_2 Synechocystis sp. ruvB gene, compl 
ete cds, and secA gene, partial cds . NID: gl256587. 
atgtttaaaataggaaatttagaattacaatctcgtttacttttaggtactggaaaattt 
gaaaatgaagaggttcagtcaaaagcaattgaggcatctgaaacaaatgtacttacattt 

10 gcagtaagacgtatgaatttatatgatcgtaacctacctaacccacttgcaaacgttaat 
ttaaaagattt tatcacttttccaaatactgcaggtgccaaaacagctcaagaagctatc 
agaattgctgaaattgctaatcacgcaggtgtatgtgacatgattaaagtcgaagtcatt 
ggtgatgacgaaacattattacctgatccattcgaaacatacgaggcatgcaaagtattg 
ttagaaaaaggttacactgtttgtccttacatctctaacgatttagttttagctcaacgt 

15 ttagaagaattgggtgtacacgcagttatgccacttgcatcccctattggtacaggaaga 
ggtattaataacccattaaatttaagttatattatcgaaaatgctagtgtacctgtaatc 
gtagatgctggtattggttcccctaaagatgcgtgtcatgccatggagcttggcgcagat 
ggtattttactcaacacagccatttcagcggcaaaagatcctgtgaaaatggctgaagca 
atgaaattaggtataaatgctggcagactttcatatgaagctggacgcattcctgttaag 

20 tatactgcacaagcatctagtccatcagaaggtttagggttcttgtaa 

Sequence 3274 

MFKIGNLELQSRLLLGTGKFENEEVQSKAIEASET1WLTFAVRRMNLYDRNLPNPLA^^ 
LKDFITFPNTAGAKTAQEAIRIAEIANHAGVCDMIKVEVIGDDETLLPDPFETYEACKVL 
25 LEKGYTVCPYISNDLVLAQRLEELGVHAVMPLASPIGTGRGINNPLNLSYIIENASVPVI 
VDAGIGSPKDACHAMELGADGILLNTAISAAKDPVKMAEAMKLGINAGRLSYEAGRIPVK 

YTAQASSPSEGLGFL* 

Sequence 327 5 

30 Con t i g_0 7 6 3^os_l 3 1 7_2 216 

is similar to (with p-value l.Oe-47) 

>gp:gp |U76260 I PAU76260_2 Peptostreptococcus asaccharolyticu 
s alpha- and beta-subunits of L-serine dehydratase (sdhB) an 
d (sdhA) genes, complete cds. NID: g2315B64. 

35 atgtttgattcaattagagagacaatagattattcagttgaaaataacatcagttttgct 
gatatgatgattaatgatgaaatggaaagagaaggtaaatctcgcgaagaagtgcgtgat 
ttaatgagacaaaacttaaatgttatgcgtgaggcagttgaaaaaggtacgactggcgat 
STdtgtggaaagtgtaacaggttatacaggtcatgatgctgctaaacttcgagattacaac 
gagaataatcatgcattatcaggtcatgaaatgattgatgcagtcaaaggtgcagttgca 

40 acgaacgaagtcaatgcagcaatgggtattatttgtgctactccgacagctggttcctcg 
ggaacgattcccggcgtaatatttaaattagaaaaaactcataatatcactgaagatcaa 
atgatagattttctattcacatcagctttattcgggcgagtagttgcaaacaacgcgagc 
gttgccggtgcaactggtggttgtcaagccgaagtgggttcggcatctgcaatggctgca 
gctgctgcagtatcaatttttaacgggtcaccagaacaatcaggacatgccatggcattg 

45 gcaattagtaacttattaggcttagtttgcgatccagttgctggtttagttgaaattcca 
tgtgtaatgcgtaatgctattggttcaggaaatgcattaatatctgcagaccttgcacta 
gctggagttgaaagtcaaattccagttgatgaagtcataggtgctatggatagagtaggt 
cgtaatttacctgca teat taagagaaacaggtttaggcggtttagcaggtacacc tact 
ggcgaagaaattaaacgtaaaatattcggcgaagcagacaacatggttaaaaataaataa 

50 

Sequence 3276 

MFDSIRETIDYSVEl^ISFADMMINDEMEREGKSREEVRDLMRQNLNVMREAVEKGTTGD 
GVESVTGYTGHDAAKLRDYNENNHALSGHEMIDAVKGAVATNEVNAAMGIICATPTAGSS 
GTIPGVIFKXiEKTHNITEDQMIDFLFTSALFGRWANNASVAGATGGCQAEVGSASAMAA 
55 AAAVSIFNGSPEQSGHAMAtiAISNLLGLVCDPVAGLVEIPCVMRNAIGSGNALISADLAL 
AGVESQIPVDEVIGAMDRVGRNLPASLRETGLGGLAGTPTGEEIKRKIFGEADNMVKNK* 



Sequence 3277 
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Contig_0765_pos_2 844_3362 

is similar to {with p-value 3.0e-17) 

>sp:Sp|P04928|SANT_PLAFN S-ANTIGEN PROTEIN PE^CURSOR. >pir: 
pir |B22011 I YAZQN7 S-antigen precursor - Plasmodium falciparu 
5 m (strain NF7/Ghana) >gp :gp |m10130 | PFASA7_1 Plasdmodium falc 
iparum (isolate NF7) S antigen gene, complete cds. NID: gl60 
670. 

gtgcttctgcattgcctccttcttctgccttcgttggctcagattgaggtgcttctgcat 
tgcttccttcttctgtcttcgttggctcagattgagctgcttctgcattgcctccttctt 

10 ctgccttcgttggctcagattgagctgcttttgcgttgcttccttcttctgtcttcgttg 
gctcagattgaggtgcttctgcattgcttccttcttctgccttcgttggctcagattgag 
ctgcttttacgttgcttccttcttctgtcttcgttggctcagattgagctgcttctgcat 
tgcctccttcttctgccttcgttggctcagattgagctgcttctgcattgcctccttctt 
ctgccttcgttggctcagattgaggtgcttctgcattgcctccttcttctgccttcgttg 

15 gctcagattgagctgcttctgcattgcctccttcttctgccttcgttggctcagattgag 
gtgcttctgcattgcctccttcttctgcttgcaatgtag 

Sequence 3278 

VLLHCLLLLPSLAQIEVLLHCFLLLSSLAQIELLLHCLLLLPSLAQIELLLRCFLLLSSL 
20 AQIEVLLHCFLLLPSLAQIELLLRCFLLLSSLAQIELLLHCLLLLPSLAQIELLLHCLLL 
LPSLAQIEVLLHCLLLLPSLAQIELLLHCLLLLPSLAQIEVLLHCLLLLLAM* 

Sequence 3279 
Contig_0768_pos_1456_1800 

25 is similar to (with p-value 4.0e-59) 

>pir :pir 1 167760 1 167760 transposase (insertion sequence ISIO 
) - Escherichia coli >gp : gp | S67119 | S67119_2 BST= somatotropin 
. . .BST/beta-Gal fusion protein (Escherichia coli, IjBB84, pla 
smid pXT107, ISlOL/R-1, PlasmidTransposonlnsertionMutant , 3 

30 genes, 1679 nt] . NID: g455674. 

atgcagattgaagaaaccttccgagacttgaaaagtcctgcctacggactaggcctacgc 
cacagccgaacgagcagctcagagcgttttgatatcatgctgctaatcgccctgatgctt 
caactaacatgttggcttgcgggcgttcatgctcagaaacaaggttgggacaagcacttc 
caggctaacacagtcagaaatcgaaacgtactctcaacagttcgcttaggcatggaagtt 

35 ttgcggcattctggctacacaataacaagggaagactcactcgtggctgcaaccctgctt 
actcaaaatctattcacacatggttacgttttggggaaattatga 

Sequence 3280 

MQIEETFRDLKSPAYGLGLRHSRTSSSERFDIMLLIALMLQLTCWLAGVHAQKQGWDKHF 
40 QANTVRNRNVLSTVRLGMEVLRHSGYTITREDSLVAATLLTQNLPTHGYVLGKL * 

Sequence 3281 

Con t ig_0 7 6 9_pos_8 6 0 6_8 947 

is similar to (with p-value 6»0e-18) 

45 >sp:Sp|Q41364|SOTl_SPIOL 2-OXOGLUTARATE/MALATE TRANSLOCATOR 
PRECURSOR. >gp;gp| A47930 I A47930_l Sequence 1 from Patent WO 
9534654. NID: g2301793 . >gp: gp | U13238 | SOU13238_l Spinacia ol 
eracea envelope membrane 2-oxoglutarate/malate translocator 
(SODiTl) mRNA, chloroplast mRNA encoding chloroplast protein 

50 , complete cds. NID: g595680. 

atgagtttgcttaaccatgggataaagcctaacttgtttaattgttctgccattaataca 
agaactgagaaccaaacgagtgtattccatgcgcctgtttcatttaaaatatctgaccac 
gctaatacacctgttaatagtaacaatgctaaagcaataaatgcagtgagcgtggcatca 
acattaatgaagcttcctaatacccacaaagccaatgctatgataaagacaccaaccatc 

55 aatttttcggctatagacatatgtcccatttcttctagttgttcagtagcccatttttta 
gcgttaggcgtttctttaacagtaggtgggtataatttataa 

Sequence 3282 

MSLLNHGIKPNLFNCSAINTRTENQTSVFHAPVSFKISDHANTPVNSNNAKAINAVSVAS 
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TLMKLPNTHKANAMIKTPTINFSAIDICPISSSCSVAHFLALGVSLTVGGYNL* 

Sequence 3283 
Contig_0769_pos_9387_8242 
5 >sp:sp|Q41364 |SOTl_SPIOL 2-OXOGLUTARATE/MALATE TRANSLOCATOR 
PRECURSOR. >gp:gp| A47930 |A47930_1 Sequence 1 from Patent WO 
9534654. NID: g2301793. >gp : gp | U13238 | SOU13238_l Spinacia ol 
eracea envelope membrane 2-oxoglutarate/malate translocator 
(SODiTl) mRNA, chloroplast mRNA encoding chloroplast protein 

10 , complete cds. NID: g595680. 

atggcctttttcatttcaagaggatttgtaaaaacagggctaggtcgacgtattgctctg 
caattcgttaaattatttggaaagaaaacgcttggtttggcttattcacttgttggtgtt 
gaccttatcttagctcctgctacgccaagtaatacagcacgtgctggtggtattatgttt 
ccaatcattaagtccttgtcagagtcatttggttcatcgccgagagatggttctgagaga 

15 aaaatgggtgcgtttttaatctttactgagttccaaggtaatttaattacttcagctatg 
tttttaacagctatggccggtaaccctatagcgcaaagtttagctgaaaaaacggcacac 
gttcaaattacatggatgaattggtttgttgctgctattatacccggattgatttctctc 
atcgttgtccctttcattatttataaattatacccacctactgttaaagaaacgcctaac 
gctaaaaaatgggctactgaacaactagaagaaatgggacatatgtctatagccgaaaaa 

20 ttgatggttggtgtctttatcatagcattggctttgtgggtattaggaagcttcattaat 
gtcgatgccacgc tcactgcatttattgctttagcattgttactattaacaggtgtatta 
gcgtggtcagatattttaaatgaaacaggcgcatggaatacactcgtttggttctcagtt 
cttgtattaatggcagaacaattaaacaagttaggctttatcccatggttaagcaaactc 
attgctcaaggtttgaatggctttagttggcctatcgttttagttttactcatcttgttt 

25 tatttctactcacattatttattcgcaagtgcaacagcacatgtcagcgccatgtacgcc 
gcgttactcggtgttgcagtcgcttcgggtgcaccgccattattcagtgcattaatgtta 
gggttctttggtaacttactggcatcaacaacacactatagtagtggaccagcgcctata 
ttatacgcagatggctatgttacacaaaagcgctggtggactatgaatattgtacttggt 
atagtctattttat tat ttggattggtgtaggttcactatggatgaaac teat tggtatg 

30 atgtaa 

Sequence 3284 

MAFFISRGFVKTGLGRRIALQFVKLFGKKTLGLAYSLVGVDLILAPATPSNTARAGGIMF 
PIIKSLSESFGSSPRDGSERKMGAFLIFTEFQGNLITSAMFLTAMAGNPIAQSLAEKTAH 
35 VQITWMNWFVAAIIPGLISLIWTFIIYKLYPPTVKETPNAKKWATEQLEEMGHMSIAEK 
LMVGWIIALALWVLGSFINVDATLTAFIALALLLLTGVLAWSDILNETGAWNTLVWFSV 
LVLMAEQLNKLGFIPWLSKLIAQGLNGFSWPIVLVLLILFYFYSHYLFASATAHVSAMYA 
ALLGVAVASGAPPLFSALMLGFFGNLLASTTHYSSGPAPILYADGYVTQKRWWTMNIVLG 
I VYF I IWIGVGSLWMKL IGMM * 

40 

Sequence 3285 
Con t ig_0 7 7 l_pos_5 8 6 8_0 
is similar to (with p-value l.Oe-34) 
>sp:sp|P4547 6|YHCC_ECOLI HYPOTHETICAL 3 4.6 KD PROTEIN IN AR 
45 CB-GLTB INTERGENIC REGION (F309) . >gp : gp | U18997 | ECOUW67_140 
Escherichia coli K-12 chromosomal region from 67.4 to 76.0 m 
inutes. NID: g606010. 

atgcatgaaaaatggagtgaaggtcaatatattgcatactttcaggcgtttacaaatacg 
catgcacctgttgaagtactaaaagaaaaatatgaacctgtcttaaaagaagatggcgtc 

50 gttggtttatcaatcgcgacaagacctgattgtttgcctgatgatgttgtagaatattta 
gctgaacttaatcagcgcacttacttatgggtagaattgggcctacaaactgtgcatcag 
tcaacttccgatttaataaatcgtgctcatgatatgcaaacatactatgacggcgtaaca 
aaattacgcaaacataatataaatgtttgtacgcacataatcaacggcttaccaggtgaa 
aattatgacatgatgatggagactgctaaagaagtcgctcagatggacgttcaaggtatt 

55 aaaattcatttattacacttgctaaaaggaacgcctatggttaaacaatatgaaaaaggt 
atgctcgagtttatg 

Sequence 3286 

MHEKWSEGQYIAYFQAFTNTHAPVEVLKEKYEPVLKEDGWGLSIATRPDCLPDDWEYL 
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AEaiNQRTYLWVELGIiQTVHQSTSDLINRAHDMQTYYDGVTKLRKHNINVCTHIINGLPGE 
NYDMMMETAKEVAQMDVQGIKIHLLHLLKGTPMVKQYEKGMLEFM 

Sequence 3287 
5 Contig_0774_pos_133 5_622 

is similar to (with p-value 2.0e-89) 

>sp:sp|P39149 |UPP_BACSU URACIL PHOSPHORIBOSYLTRANSFERASE (E 
C 2.4.2.9) (UMP PYROPHOSPHORYLASE) (UPRTASE) . >pir :pir |S4936 
4|S49364 uracil phosphoribosyltransf erase - Bacillus subtili 
10 s >gp:gp| Z3 8002 |BSSPORUPP_10 B.subtilis spoII-R, glyC and up 
p genes. NID: g556877. >gp:gp| Z99122 |BSUB0019_186 Bacillus s 
ubtilis complete genome (section 19 of 21) : from 3597091 to 
3809700. NID: g2636029. 

atgaagctaaagagagagttcacgcactcacatctaaatatccattatataattaataga 
15 catacacattggaggaaaatgattatgagtaaagtacatgtttttgatcacccattaata 
caacacaaactaagttatattagagatgctcgcactggaacaaaagagtttagggaactt 
gtagatgaagtcggtatgttaatggcttatgaagtaactagagacttagaactgcaagat 
gttgaaatacaaacacctgtgactaaaatgacagctaaacgtttggcgggtaaaaagtta 
gcaattgtacctattttaagagctggtctaggcatgacagatggtgtgttaagtcttgtt 
20 cctgctgctagggtaggacatataggactatatagagatccagagactcttgaagcggta 
gagtactttgcgaaaatgcctcaagacatcgatgaacgtcaaattattgtggttgatcct 
atgcttgctactggtgcttcagctattgaagcaatttcttcattaaaaaaacgtggagct 
aaaagtatacgttttatgtgtttaatagctgcccctgaaggcgttgaaaaaatgcaagaa 
gcacacccagatgtagatatatatattgcggcattagatgaaaaattaaatgacaaagcg 
25 tatattacaccaggtttaggtgatgcaggggatagattattcggtactaaataa 

Sequence 32 88 

MKLKREFTHSHLNIHYIINRHTHWRKMIMSKVHVFDHPLIQHKLSYIRDARTGTKEFREL 
VDEVGMLMAYEVTRDLELQDVEIQTPVTKMTAKRLAGKKLAIVPILRAGLGMTDGVLSLV 
30 PAARVGHIGLYRDPETLEAVEYFAKMPQDIDERQXIWDPMLATGASAIEAISSLKKRGA 
KSIRFMCLIAAPEGVEKMQEAHPDVDIYIAALDEKLNDKAYITPGLGDAGDRLFGTK* 

Sequence 32 8 9 

Contig_077 5_pos_5 03 6_4413 

35 is similar to (with p-value 2.0e-43) 

>sp:sp|P44310|KGUA_HAEIN GUANYLATE KINASE (EC 2.7.4.8) (GMP 
KINASE). >pir:pir |H64139|H64139 5'guanylate kinase (gmk) ho 
molog - Haemophilus influenzae (strain Rd KW20) >gp:gp|U3284 
8|U32848_2 Haemophilus influenzae Rd section 163 of 163 of t 

40 he complete genome. NID: g3212240. 

atggataaggaaaaaggactgttaattgttctttcaggcccttcaggtgttggaaaggga 
actgttagaaagaagatatttgaagacccaactacttcatataagtattctatatcaatg 
acgacacgtcatatgcgtgaaggtgaaattgatggtgtagattacttctttaaaacaaag 
gaagaatttgaggcgttaattaaagacgaccagtttattgagtatgcacaatatgtaggt 

45 aattactatggtacacctgtacaatatgtaaaggatactatggaagaaggtcatgacgtc 
tttttagaaatcgaagttgaaggtgctaagcaagtaagaaagaaatttccagatgcgttg 
ttcatatttttagcgcctccaagtttagatgacttgaaagaacgtcttgttggtagagga 
actgaatcagatgaaaagattcaaagtcgtgtgaacgaggcacgaaaagaagtagaaatg 
atgaatttatacgactacgttgtagttaacgacgaggttgaactcgctaagaatcgaatt 

50 cagtcaatagttgaagctgagcatttaaaaagagagcgaatcgaagctaaatatagaaaa 
a tg t t ac tggagg t c aaaaaa t aa 

Sequence 3290 

MDKEKGLLIVLSGPSGVGKGTVRKKIFEDPTTSYKYSISMTTRHMREGEIDGVDYFFKTK 
55 EEFEALI KDDQF I EYAQYVGNYYGTPVQ YVKDTMEEGHDVFLEIEVEGAKQVRKKFPDAL 
FIFLAPPSLDDLKERLVGRGTESDEKIQSRVNEARKEVEMMNLYDYVVVNDEVELAKNRI 
QSIVEAEHLKRERI EAKYRKMLLEVKK* 

Sequence 3291 
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Cone i g_0 7 7 9_pos_0_l 180 

is similar to (with p-value 6.0e-61) 

>gp:gp|U31756|BSU31756_2 Bacillus subtilis gamma -aminobutyr 
ate permease (gabP) gene and spore coat protein (cotA) gene, 
5 partial cds . NID: g969025. 

atgattgccattgggggatgtataggaactggtctattcatgacttctggtggagctata 
catgacgcaggtgcattgggtgctttgattgcctatgcagttattggagcgatggtgttc 
tttctaatgacg teg ttaggggagatggcgacatatttgcctgtgtcaggt teat ttagt 
acttatgctacacgctttgtcgatccttcactaggttttgctttaggatggaattattgg 

10 ttcaactgggtgataacagtagcagctgatgttactattgcagcgcaagttatacaatat 
tggtcccctatgcaaggtataccagcttgggtctggagttgtattttccttattattatt 
ttcgegcttaattctttatccgttagagtatatggagagagtgaatattggttcgcactt 
atcaaagtagttacagtcatcatatttataggaattggtatcttaactattttagggatt 
sitgggtggagaatttgtaggatttgatacgtttaeaaaaggagatgggeeaatactaggt 

15 gggaatttaggaggtagettgctatcaattcttggtgtatttctagtcgcaggcttctca 
ttccaaggaactgaacttattggtattacageaggtgaatctgaaaatccagaaagagca 
gttccaaaagcgattaaacaagtattttggcgtatacttttattttacattctagctatt 
ttcattattggtatgttgattccatatgatagtaaggcattaatgggcggtggtgatagt 
atagctacttcaccttttacattagtatttaagaatgctggattagcttttgctgcttca 

20 tttatgaatgctgttatattaacaagtgtattatcagcaggtaactcaggaatgtatgct 
tcaacaagaatgttatattcgatgagtaaagataaattagcttataattcttttggaaaa 
acaaataaaagtggegtaccttatgtatctctaattgcaactggagtactagtcattctt 
attttcgeattgcaaeatttaagtggagatgcatatgaatacattgtagctgctagegga 
atgactggttttattgcttgggttggtatagcaatcagtcactttagatttagacgcgca 

25 tttgataaacaaaattatgataaatcaaaattaaaatatC 

Sequence 3292 

MIAIGGCIGTGLFMTSGGAIHDAGALGALIAYAVIGAMVFFLMTSLGEMATYLPVSGSFS 
TYATRFVDPSLGFALGWNYWFNWVITVAADVTIAAQVIQYWSPMQGIPAWVWSCIFLIII 
30 FALNSLSVRVYGESEYWFALIKWTVIIFIGIGILTILGIMGGEFVGFDTFTKGDGPILG 
GNLGGSLLSILGVFLVAGFSFQGTELIGITAGESENPERAVPKAIKQVFWRILLFYILAI 
FI IGMLI PYDSKALMGGGDSI ATSPFTLVFKNAGLAFAASFMNAVILTSVLSAGNSGMYA 
STRMLYSMSKDKLAYNSFGKTNKSGVPYVSLIATGVLVILIFALQHLSGDAYEYIVAASG 
MTGF I AWVG I A I SHFRFRRAPDKQNYDKSKLKYX 

35 

Sequence 32 93 
Contig_07 86_pos_2712_3 827 
is similar to (with p-value 7.0e-97) 
>sp:sp|P96612 |DDL_BACSU D- ALANINE- -D- ALANINE LIGASE (EC 6.3 

40 .2.4) (D-ALANYLALANINE SYNTHETASE) . >gp: gp| AB001488 | AB001488 
_40 Bacillus subtilis genome sequence, 148 kb sequence of th 
e region between 35 and 47 degree. NID: gl881226. >gp:gp|Z99 
106 1 BSUB0003_103 Bacillus subtilis complete genome (section 
3 of 21): from 402751 to 611850. NID: g2632653. 

45 atggaggaaaacgaaatgacaaaagaaaatatttgtatcgtttttggaggtaaaagtgca 
gaaeacgatgttteaattttaaetgcacaaaatgttttaaacgcaattgataaagaacga 
tatcaagttgatatcatttatataacaaacgatggtgaatggaagaaaaaagataatatt 
acacaagaaataaaaaatactgatgaactcgtcattaacgatgtagagactggagaaatc 
tcacagttactcagtaaaggtagtttaggaaaatcatatgatgcagtattcccattattg 

50 catggtccaaatggagaagatggaactatccaaggtctttttgaagtacttgatatacca 
tatgtaggtaatggtgtgttagctgcttcaagctcaatggataaactcgtgatgaaacaa 
ttatttgageatagaggtttacctcaattaccttatattagctttttaagaagtgagtat 
gaaaaatatgaaaataatatcattaaattagttaatgataagttaacatatccggtattt 
gtaaaacctgctaatctcggttcaagtgttggtataagtaaatgtaacaatgaagaagaa 

55 ttaaaatctgggatagctgaagcattccaatttgatcgtaaacttgtcattgaacaaggg 
attaatgetagagagategaagtagctgtcttaggtaacgattatcctgaaacgacatgg 
cctggtgaggttgttaaggatgtagcgttttatgattataaatcaaagtataaagacggt 
aagattagattagatattccagcagatttagatcaagatgttcaaatgacattaagaaac 
atggcattagaggcctttaaagctactgattgttcgggattagttcgtgcagatttcttt 
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gttactgatgataatcaaatttatattaatgaaacaaatgccatgccaggatttactgca 
tatagtatgtatccaaacttatggaaaaatatgggcttatcttaccctgatttaattgct 
aaa t tga 1 1 ga 1 1 tagc taaagaacg t ta t gaaga t aaaaagaaaaa taaata ttcacaa 
atgtttcatctaaaaccgatcgttttttatatttaa 

5 

Sequence 3294 

MEENEMTKENICIVFGGKSAEHDVSILTAQNVLNAIDKERYQVDIIYITNDGEWKKKDNI 
TQEIKNTDELVINDVETGEISQLLSKGSLGKSYDAVFPLLHGPNGEDGTIQGLFEVLDIP 
YVGNGVLAASSSMDKLVMKQLFEHRGLPQLPYISFLRSEYEKYENNIIKLVNDKLTYPVF 
10 VKPANLGSSVGISKCNNEEELKSGIAEAFQFDRKLVIEQGINAREIEVAVLGNDYPETTW 
PGEVVKDVAFYDYKSKYKDGKIRLDIPADLDQDVQMTLRNMAIiEAFKATDCSGLVRADFF 
VTDDNQIYINETNAMPGFTAYSMYPNLWKNMGLSYPDLIAKLIDLAKERYEDKKKNKYSQ 
MFHLKPIVFYI* 

15 Sequence 3295 

Con t ig_0 7 8 8_pos_6 5 6 SJ7 542 
is similar to (with p-value 8.0e-33) 
>gp:gp |AB011003 |ab011003_1 Candida albicans CaUAPl gene for 
UDP-N-acetylglucosamine pyrophosphorylase, complete cds . NI 

20 D: g3273313. 

atgaaaatttttgattatgaagatatacaattaattcccaataaatgtattgttgaaagc 
agatctgagtgtaatacttcggttaaatttgggcctcgtacttttaaattgccagttgtt 
ccagcaaatatgcaaacagtcatgaatgaagaacttgcacaatggtttgcagaaaacgat 
tatttttatatcatgcatagatttaatgaagaaaatagaattccatttataaaaaaaatg 

25 catcatgcagggttatttgcttctatttctgttggagttaaagaaaacgaatttaatttt 
attgaaaaattagct tot tea tcgctcataccagaatatataacaattgatattgc teat 
ggtcactcaaattcagttataaatatgattaagcatataaaaaaacatttaccaaatagt 
tttgtgatagctggtaatgttgggacgcctgaaggagtaagagaacttgagaatgccggt 
gcagatgctacaaaagtaggtattggtccaggaagagtatgtattactaaaattaaaact 

30 ggatttggtacaggaggttggcaactttctgcgttaaatctttgtaataaggcagctaga 
aaacctattattgcagatggagggttaagaacccacggtgatatagccaaatcaattcgt 
tttggtgccactatggtaatgattggctctttatttgctgcccacgaggaatcaccgggt 
gaaaccgtcgagctagatggcaaaaaatataaagaatattttggtagtgcctcagaatat 
caaaaaggtgaacataagaacgttgaaggtaaaaaaatgtttgtagaacacaaaggatct 

35 cttaaagatacccttactgaaatggaacaagatttacagagttcaatttcatatgcagga 
ggaaaggacttgaagtcattaagaacagttgattacgtcatagtaagaaattcaatcttt 
aatggtgatagagattag 

Sequence 3296 

40 MKIFDYEDIQLIPNKCIVESRSECNTSVKFGPRTFKLPWPANMQTVMNEEIiAQWFAEND 
YFYIMHRFNEENRIPFIKKMHHAGLFASISVGVKENEFNFIEKLASSSLIPEYITIDIAH 
GHSNS VINM I KH I KKHL PNSFVI AGNVGTP EG VREL ENAGADATKVG IG PGRVC I TKI KT 
GFGTGGWQLSALNLCNKAARKPIIADGGLRTHGDIAKSIRFGATMVMIGSLFAAHEESPG 
ETVELDGKKYKEYFGSASEYQKGEHKNVEGKKMFVEHKGSLKDTLTEMEQDLQSSISYAG 

45 GKDLKSLRTVDYVI VRNS IFNGDRD* 

Sequence 3297 
Contig_0788_pos_4151_3402 
is similar to (with p-value 6.0e-40) 
50 >sp:sp| P27442 |GUAC_ASCSU GMP REDUCTASE (EC 1.6.6.8) (GUANOS 
INE 5 ' -MONOPHOSPHATE OXIDOREDUCTASE) . >gp : gp | M8283 8 | NEMGMP_1 
Ascaris lumbricoides GMP reductase mRNA, complete cds. NID: 
gl59660. 

atgacaagtgacattaatcataaagatacaatagagtattttaaacaacataaatatttt 
55 aactatgatgccaatcatattcatttctttaagcaagataacattgttgctttaagtgaa 
gaaggaaagcttgttttaaatagagatggacatataatggaaacacctaatggtaatggg 
ggtgtattcaagtctcttaagaaagcaggataccttgataagatgcaacaagatcacgtc 
aaatatatcttcttaaataacattgataatgtcttagttaaagttttagacccgttattt 
gccggttttacagtgacacaaagtaaagacatcacatcaaaaacaattcaacctaaagat 
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agtgaaagtgtaggtcggcttgtaaatgttgattgtaaagacactgtgttagagtattct 
gaattagatacagacatagcaaatcaatttaacaatgctaatataggtatccatgctttt 
aaactaggtttcattaccagtgctgttgatagagaattaccgtatcatttggcaatcaag 
caattaaagcaattagatgaaaattttggtgtggttgaacgtcctacattaaagtttgaa 
5 ttattttattttgatatatttaggtacggtacaagctttgtaacattacaagttccaaga 
gaagaagaattttcacctttaaaaaataaagaaggtaaggatagtgtacatacagctaca 
gaaga 1 1 1 aaaacgaa tggat t tga 1 1 taa 

Sequence 3298 

10 MTSDI^mKDTIEYFKQHKYFmDANHIHFFKQDNrVALSEEGKLVLNRDGHIMETPNGNG 
GVFKSLKKAGYLDKMQQDHVKYIFLNNIDNVLVKVLDPLFAGFTVTQSKDITSKTIQPKD 
SESVGRLVNVDCKDTVLEYSELDTDIANQFNNANIGIHAFKLGFITSAVDRELPYHLAIK 
QLKQLDENFGWERPTLKFELFYFDIFRYGTSFVTLQVPREEEFSPLKNKEGKDSVHTAT 
EDLKRMDLI* 

15 

Sequence 3299 

Con t i g_0 7 9 5_pos_2 8 6 6_4 128 

>gp:gp|AJ224946|CGA224946_l Corynebacterium glutamicum DNA 
for L-MalatG:quinone oxidoreductase . NID: g3059092. 

20 atgcacatgagtgaagcaaatcataaaaacatcgttgttgtaggtgcaggaattattggt 
acgtcagtagcgacaatgctttcaaaagtaagtcctaactggcatatcgatatgtttgaa 
agactagaaggcgctggtattgaaagttcaaatgaaaataataatgctgggacaggtcat 
gcggcattatgtgaattaaactatacagttgaacaagatgatggttcaattgatgcatct 
aaagcgcaagaaattaatgaacaattcgaattatctagacaattctggggtaatttagtt 

25 aaaaatggtgatatttctaatcctgaagaatttattcaaccattacctcatatcagtttc 
gttatgggaccaacaaacgttaactttttaagaaaacgttatgaaacactaagaactctt 
ccaatgttcgatacaatcgaatatacagaagacatggaaacaatgagaaaatggatgcca 
ttaatgatggaaaatcgtgaaccaggtcatcaaatggcagcaagtaaaattgatgaaggt 
acagatgtgaactatggtgcgttaacacgtaagttagcacattacttagaacaaaaatct 

30 aatgtttcattaaaatacaatcatgatgttgtagatttaacacaaagagaagatggcaaa 
tgggaagttgtcgttgaaaatagagaaactaaagaaaaagtaactaaaatagcagataaa 
gtgtttattggtgctggcggtcactctattccgttattacaaaaatctggcgttaaacaa 
agagaacacctaggtggtttcccaatcagtggtcaattcttaagatgtacaaacccagat 
attattaaacaacatgcggctaaagtttacagtaaagagcctcaaggtaagccaccaatg 

35 actgtaccacaccttgatacacgttatatcaatggtaaacaaacattattatttggtcca 
tatgcgaatatcggccctaaattcttgaaattcggttcaaatctagacttattcgaatca 
atcaaaccatataacactactacaatgttggcttcagcagttaaaaatgtacctttaatt 
aaatattcaattgatcaaatgatcaaaactaaagaaggttgtatgaactatttaagaaca 
tttattcctgatgctaaagatgaagattgggaactttacactgctggtaaacgtgttcaa 

40 gttattaaagataaccatttgaagcaggaccttactttattgcagcagtcgtatttgtta 
tag 

Sequence 3300 

MHMSEANHKNIVWGAGIIGTSVATMLSKVSPNWHIDMFERLEGAGIESSNENNNAGTGH 
45 AALCELNYTVEQDDGSIDASKAQEINEQFELSRQFWGNLVKNGDISNPEEFIQPLPHISF 
VMGPTN^mFLRKRYETLRTLPMFDTIEyTEDME?^MRKWMPL^mEl«lEPGHQM^ 
TDVNYGALTRKLAHYLEQKSNVSLKYNHDWDLTQREDGKWEVWENRETKEKVTKIADK 
VFIGAGGHSIPLLQKSGVKQREHLGGFPISGQFLRCTNPDIIKQHAAKVYSKEPQGKPPM 
TVPHLDTRYINGKQTLLFGPYANIGPKFLKFGSNLDLFESIKPYNITTMLASAVKNVPLI 
50 KYSIDQMIKTKEGCMNYLRTFIPDAKDEDWELYTAGKRVQVIKDNHLKQDLTLLQQSYLL 
★ 

Sequence 3301 
Contig_0795_pos_4741_5160 
55 is similar to (with p-value 7.0e-17) 

>sp:sp| P77 279 |YBBL_ECOLI HYPOTHETICAL ABC TRANSPORTER ATP-B 
INDING PROTEIN IN USHA-TESA INTERGENIC REGION. >gp : gp | U82664 
|ECU82664_88 Escherichia coli minutes 9 to 11 genomic sequen 
ce.' NID: gl773084. >gp:gp | AE000155 1 AE000155_6 Escherichia co 
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li K-12 MG1655 section 45 of 400 of the complete genome. NID 
: gl786692. 

atgcaacaaagtgagttaatcggttatacaattgaagataatatgaaatttcctgccgag 
gctagaagtgaagcttttgaccgtgataaagcgaaacaactcatctctcaagtaggatta 
5 ggtaattatcagttagatgctcaaattgagcacatgtctgggggagagcaacaacgtatt 
accatcgctagacaactcatgtatgaacctgaagttttattattggacgaagctactagc 
gctttagatacacataataaaaagaaaattgaagaaattatatttaaactagcagataaa 
gggattgccattttgtggattacgcatagtgatgaccaaagtatgcgtcattttaagcgt 
agaatcacaattactgacggtaagatatcgagtgatgaggagttgaatggtaatgagtaa 

10 

Sequence 3 3 02 

MQQSELIGYTIEDNMKFPAEARSEAFDRDKAKQLISQVGLGNYQLDAQIEHMSGGEQQRI 
TIARQLMYEPEVLLLDEATSALDTHNKKKIEEIIFKLADKGIAILWITHSDDQSMRHFKR 
RITITDGKISSDEEUJGNE* 

15 

Sequence 3303 
Contig_0795_pos_5525_5929 
is similar to (with p-value 2.0e-32) 
>sp:sp|P773 07 |YBBM_ECOLI HYPOTHETICAL 28.2 KD PROTEIN IN US 

20 HA-TESA INTERGENIC REGION. 

atgttagctaataatggtttaatcgcgattaatctcgcttatcagaatttagaaagagca 
tttgttcaagatgtttctgatattgaatccaaacttacgttagcagcgacacctaagctc 
gcatcaaaatcagctattagagaaagtatacgcttagcaattgttcctacaattgattct 
gtaaaaacatatggtctagtttcaattccaggtatgatgacaggattgattatcggaggc 

25 gttgacccacttcaagcaattaaatttcaattgcttgtcgtgtttattcatacaacagcg 
acgattatgtctgcactcattgcaacgtatatgagttacggtcagttctttaatgctcgt 
catcaactcattgctagaacgcaacgcacaagacaaagtagttaa 

Sequence 3304 

30 MLANNGLIAINLAYQNLERAFVQDVSDIESKLTLAATPKLASKSAIRESIRLAIVPTIDS 
VKTYGLVSIPGMMTGLIIGGVDPLQAIKFQLLWFIHTTATIMSALIATYMSYGQFFNAR 
HQLIARTQRTRQSS* 

Sequence 3305 
35 Con t ig_0 7 9 5_pos_6 0 6 8_7 246 

is similar to (with p-value 7.0e-24) 

>gp.'gp| A08113 |A08113_1 Synthetic DNA sequence of chloramphe 
nicol-resistance gene. NID: g413362. 

atgactgtcgcacgcacaacaacatttatattaagtgttttcatcgtaggtatggttgag 

40 atgatggttgcaggtattatgaatttaatgagtcaagatcttcatgtttctgaagcggtg 
gttgggcaactggtaacgttgtacgcgcttacctttgcgatatgtggtccgttattagtg 
aagttgac tea tcggtt tact tcgcgatcagtattattatggacgttaattgtctttatc 
ttcgggaatggtatgattgccattgcacctcattttggaataatagttgtaggacgtatt 
ttatcttctgccgcagcttcactcattattgtgaaagttttagcactcacagcgatgctc 

45 acatcagcaaaaaacagaggtaaaatgattggtattgtttatacaggttttagtggggcg 
aatgtctttggtgttcccatcggtacagtgattggcgact'gggtaggatggcgatttaca 
tttttattcattattattgtaagtgtatttgttggtgttttaatgttaatttatctacca 
aaagaagatgaattgtcacatccaaatcaaacacctcgttcatctagtattgaatcacaa 
actggctcaagcgtcataagacctcgtgaggtttttaaatatctgatgattacattttta 

50 gtgctggttgctaattctgtaacattcgtgtttattaatccattaattttatccaatgga 
catgaaatgtcttttgtgtctttagcactacttgttaatggtgtagcaggtgtgattggt 
acttcattaggcggtgttttatctgataagtttactagtaagcgttggttaataatttcg 
atttcaatatttataataatgatgataattcttaacttattattaccaggaacaggatta 
ttattagttggcttatttatgtggaatttaatgcagtggagtacaaatccagctattcaa 

55 agtggtattattgaacacgtcgaaggagatacaagccaagtgatgagttggaatatgtcg 
agtctcaatgccggtattggcgttggcggaatcgtaggtggactcgttatgacacattta 
tcagtggaatatgttacttatactagtgcattgataggtttaattagtcttatcattgta 
ttcactttaaaaaatagacattatgctaaaaatttatga 
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Sequence 3306 

MTVARTTTFILSVFIVGMVEMMVAGIMNLMSQDLHVSEAWGQLVTLYALTPAICGPLLV 
KLTHRFTSRSVLLWTLIVFIFGNGMIAIAPHFGIIWGRILSSAAASLIIVKVLALTAML 
TSAKNRGKMIGIVYTGFSGANVFGVPIGTVIGDWVGWRFTFLFI I IVSVFVGVLMLI YLP 
5 KEDELSHPNQTPRSSSIESQTGSSVIRPREVFKYLMITFLVLVANSVTFVFINPLILSNG 
HEMSFVSLALLVNGVAGVIGTSLGGVLSDKFTSKRWLIISISIFIIMMIILNLLLPGTGL 
LLVGLFMWNLMQWSTNPAIQSGIIEHVEGOTSQVMSWNMSSLNAGIGVGGIVGGLVMTHL 
SVEYVTYTSALIGLISLI IVFTLKNRHYAKNL* 

10 Sequence 33 07 

Con t ig_0 7 9 9_pos_3 8 9 3_3 078 

is similar to (with p-value l.Oe-20) 

>gp:9P|U67964 |EVU67964_2 Ectromelia virus H14-B and H14-E g 
enes, complete cds . NID: g2145123. 

15 atgcattatataaaatttattgagtcaaaagataatacaaaactttatatgaaagtgaat 
gatattcaagatgcaaaagcgaatatcattatagctcatggtgtggcagaacatttagat 
cgttatgatgagataacagcatatttaaatgaagcgggttttagtgttattagatatgat 
caaagagggcatggtcgttctgaaggcaagcgtgccttttatagcaatagtaatgaaatt 
gtcgaagatttagatgcgataataaattatgtgaagtcaaactttgaaggtaaagtttac 

20 ttaatcggtcatagtatgggtggttatacagtcactttatatggaacgaaacatccaaat 
acagtgaatggtattataacttctggagcattaacacgttataataataaactatttggc 
aatcctgatagaaacatatcacctgatacttatatagaaaacaatttaagtgagggggta 
tgttctgatttagaggtaatggaaaaatataaacttgatgatttgaatgcgaaacaaatc 
tctatggggctcgtcttttcaataatggatggtgttaggtatttgaaagacaatgctcaa 

25 caatttacagataatattttgatattgcatggcaaggaagatgggctagtaagctatgta 
gattctttacagctttatcaagaaataggatcagcacataaatcattacacatctatgat 
cgtttggagcatgaaatatttaatgaaagttcttataatagaactatttttaacgaagtt 
attgaatggcttgaaacggaattaacttataactaa 

30 Sequence 3308 

MH Y I KF I E S KDNT KL YMKVND I QDAKANII I AHGVAEHLDRYDE ITAYLNEAGFSV I RYD 
QRGHGRSEGKRAFYSNSNEIVEDLDAIINYVKSNFEGKVYLIGHSMGGYTVTLYGTKHPN 
TVNGIITSGALTRYNNKLFGNPDRNISPDTYIENNLSEGVCSDLEVMEKYKLDDLNAKQI 
SMGLVFSIMDGVRYLKDNAQQFTDNILILHGKEDGLVSYVDSLQLYQEIGSAHKSLHIYD 

35 RLEHEIFNESSYNRTIFNEVIEWLETELTYN* 

Sequence 3309 

Cont ig_0 8 0 2^os_4 9 2 6_3 982 

is similar to (with p-value 3.0e-73) 

40 >gp :gp| S72926 I S72926_l Hordeum vulgare glucose and ribitol 
dehydrogenase homolog mRNA, complete cds. NID: g633889. 
gtgtgggcttggtacactcatttacaagaattcaattctgatccgaatatacaaaatttt 
gatgaaatgcttaacaaactacaaaaggtcagtttaattagtgcaagtgaagaaagtggg 
actaaaaaaattgtagatcactttgtcgaagaattatatagcgaagaaccaaaacaaaaa 

45 atcaatacaggttataaactggtggattacaaaataggtggtttagaacctacacagttg 
attgtaatcgctgcgagaccgtcagtaggtaaaacggggtttgcgcttaatatgatgctt 
aatatagcgtctcaaggctataaaacttcattcttcagtctagagacaactggcgtgtct 
gtattgaaaaggatgttatcagcagaaactgggatagaactaactcgtatcaaagaaatt 
aaagatttagaaccggatgaattaacacgtttaacaactgcagcagacagaatactcaaa 

50 cttgatatagatatacacgataaaagcaatattactacacatgatgtacgtaaacaagcg 
a t gaagaacaaaga tg tgcaacagg t ta tc t tea t tgac t ac 1 1 acaac 1 1 a t gcagaca 
gacagtaagttagatcgtcgtaatggtatcgaaaagatatcgcgagatttgaagattatt 
gcaaatgaaacaggtgcaattattgtgttgctatctcaattgagcagaggtgtagaaaca 
agaaatgacaaaagacctatgctatctgacatgaaagaagcaggtggaattgaagcagat 

55 gcaagtttagctatgttgttatatcgagatgattactacaaccgtgatgatgttgatgac 
tcaggcaagtcaattgttgaatgtaacatcgcaaagaataaagacggagaaacaggtgta 
g t tgag 1 1 tgag tac t acaagaaaacgcagagg t tot t caca tga 

Sequence 3310 
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VWAWYTHLQEFNSDPNIQNFDEMLNKLQKVSLISASEESGTKKIVDHFVEELYSEEPKQK 
INTGYKLVDYKIGGLEPTQLIVIAARPSVGKTGFALNMMLNIASQGYKTSFFSLETTGVS 

VLKRMLSAETGIELTRIKEIKDLEPDELTRLTTAADRILKLDIDIHDKSNITTHDVRKQA 
MKNKDVQQVIFIDYLQLMQTDSKLDRRNGIEKISRDLKIIANETGAIIVLLSQLSRGVET 
5 RNDKRPMLSDMKEAGGIEADASLAMLLYRDDYYNRDDVDDSGKSIVECNIAKNKDGETGV 
VEFE YYKKTQRFFT * 

Sequence 3311 
Con t ig_0 8 0 2_^os_3 0 3 9_2 170 
is similar to (with p-value 3.0e-36) 

>sp:sp|P37469 |DNAC_BACSU REPLICATIVE DNA HELICASE (EC 3.6.1 
.-)• >gp:gp|D26185 |BAC180K_4 B. subtilis DNA, 180 kilobase r 
egion of replication origin. NID: g467326. >gp : gp | Z99124 | BSU 
B0021_149 Bacillus subtilis complete genome (section 21 of 2 
1): from 3999281 to 4214814. NID: g2636442. 

gtgaatttattaaaatttcacaacaaaatcaaaggatatactcaaaatagacaaccaggt 
attgaagcggatatggaacctaaacccattgcagaattagaagaatataaagcagcagga 
aagttagagaataaagttgctctaataacaggaggagattcaggtattggacgtgcgata 
gcaatactatatgctaaagaaggggcaaatgttgctattggttattatgacgaacatcaa 
gatgccgaagacacagttaatcgacttcaagaaatgggtgtaaaagctaaagcttatgct 
catgacttaaaagatgaaaagcaatctcaaaagttaatcaaagatgtcataaatgacttc 
ggtagtttaaatatattagtaaataatgggggcgtgcaatttccacgcgatcattttgaa 
gatatcactccacaacaagtgaaagaaacttttatgacgaatatttttggtatgatgttt 
ttatcccaatcagcagtaccttacctatctgaaggagatacaattataaacactacaagt 
gtcacagcatatagaggatcggggcatctcattgattattcagctacaaaaggtgccata 
gtaccgtttacccgttctcttgctactactttaatggaaaagggaattcgcgttaacgcc 
gttgcgcccggcccaatctattcacctttaattcctgcgacatttgatgaagaaaaagta 
gaacatcaaggtgacgaaacgccgatgggtcgtcgtggacaaccagcagaacttgcacct 
tcttatgtcttcttagcaacacatgcagatagttcctatattactggtcaagtcattcat 
gtcaatggtggcgattttatcacatcttaa 

Sequence 3312 

VNLLKFHNKIKGYTQNRQPGIEADMEPKPIAELEEYKAAGKLENKVALITGGDSGIGRAI 
AILYAKEGANVAIGYYDEHQDAEDTVNRLQEMGVKAKAYAHDLKDEKQSQKLIKDVINDF 
35 GSLNILVNNGGVQFPRDHFEDITPQQVKETFMTNIFGMMFLSQSAVPYLSEGDTIINTTS 
VTAYRGSGHLIDYSATKGAIVSFTRSLATTLMEKGIRVNAVAPGPIYSPLIPATFDEEKV 
EHQGDETPMGRRGQPAELAPSYVFLATHADSSYITGQVIHVNGGDFITS* 

Sequence 3313 

40 Contig_0804_pos_4400_3708 

is similar to (with p-value 3.0e-62) 

>sp:sp|P96612 |DDL_BACSU D- ALANINE- -D- ALANINE LIGASE (EC 6.3 
.2.4) (D-ALANYLALANINE SYNTHETASE). >gp : gp | AB001488 | AB001488 
_40 Bacillus subtilis genome sequence, 148 kb sequence of th 

45 e region between 35 and 47 degree. NID: gl881226. >gp:gp|Z99 
106 1 BSUB0003_103 Bacillus subtilis complete genome (section 
3 of 21): from 402751 to 611850. NID: g2632653 . 
atggataaactcgtgatgaaacaattatttgagcatagaggtttacctcaattaccttat 
attagctttttaagaagtgagtatgaaaaatatgaaaataatatcattaaattagttaat 

50 gataagttaacatatccggtatttgtaaaacctgctaatctcggttcaagtgttggtata 
agtaaatgtaacaatgaagaagaattaaaatctgggatagctgaagcattccaatttgat 
cgtaaacttgtcattgaacaagggattaatgctagagagatcgaagtagctgtcttaggt 
aacgattatcctgaaacgacatggcctggtgaggttgttaaggatgtagcgttttatgat 
tataaatcaaagtataaagacggtaagattagattagatattccagcagatttagatcaa 

55 gatgttcaaatgacattaagaaacatggcattagaggcctttaaagctactgattgttcg 
ggattagttcgtgcagatttctttgttactgatgataatcaaatttatattaatgaaaca 
aatgccatgccaggatttactgcatatagtatgtatccaaacttatggaaaaatatgggc 
ttatcttaccctgatttaattgctaaattgattgatttagctaaagaacgttatgaagat 
aaaaagaaaaa taaa tataaaa 1 1 gat ta t tag 
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Sequence 3314 

MDKLVMKQLFEHRGLPQLPYI SFLRSEYEKYENNI IKLVNDKLTYPVFVKPANLGSSVGI 
SKCNNEEELKSGIAEAFQFDRKtiVIEQGINAREIEVAVLGNDYPETTWPGEWKDVAFYD 
5 YKSKYKDGKIRLDIPADLDQDVQMTLRNMALBAFKATDCSGLVRADFFVTDDNQIYINET 
NAMPGFTAYSMYPNLWKNMGLSYPDLIAKLIDLAKERYEDKKKNKYKIDY* 

Sequence 3315 
Cent i g_0 8 0 8_pos_6 7 4 5_6 2 6 9 
10 is similar to (with p-value l.Oe-55) 

>gp:gp|AF068246 |AF068246_1 Mus musculus SA protein mRNA, co 
mplete cds , NID: g3928675. 

gtgtttaattacttatctacgaaagaagacgaaagagaatgggttgaagcaattagagta 
gcaagaaatatcctaaaacaaaaagctatggacccatttaatggtggcgaaatttcacca 

1 5 ggaccacaag 1 1 caaacgga tgaagaaat t c t aga 1 1 ggg tacgt aaaga tggagaaac t 
gcattacatccatcttgtagcgcgaaaatgggacctgcatctgacccaatggcagtagtc 
gatccattaactatgaaagtacatggtatggaaaatttacgtgtcgttgatgcttcagca 
atgcctagaacaacaaatggtaatattcatgcacctgtattgatgttagctgagaaagca 
gcggacattattcgtggtagaaaaccgcttgaacctcaatatgttgactattataaacat 

20 ggtattgatgatgaaaaagcaggtgcaatggaagatgatccattctaccaatattaa 

Sequence 3316 

VFNYLSTKEDEREWVEAIRVARNILKQKAMDPFNGGEISPGPQVQTDEEILDWVRKDGET 
ALHPSCSAKMGPASDPMA\An)PLTMKVHGMENLRVVDASAMPRTTNGNIHAPVLMLAEKA 
25 ADIIRGRKPLEPQYVDYYKHGIDDEKAGTO^EDDPFYQY* 

Sequence 3317 

Contig_0808 _pos_3 296_2034 

is similar to (with p-value 7.0e-29) 
30 >sp:sp|P17444 |BETA_EC0LI CHOLINE DEHYDROGENASE (EC 1.1.99.1 

) (CHD) . >gp:gp|X52905|ECBET_5 Escherichia coli betT, betl, 

betB and betA genes. NID: g48714. >gp: gp |m77738 | EC0BETA_1 E. 

coli choline dehydrogenase (betA) gene, complete cds . NID: g 

145401. >gp:gp| AE000138 |AE000138_2 Escherichia coli K-12 MGl 
35 655 section 28 of 400 of the complete genome. NID: gl786501. 

atgaataaatcaaatttactagcacctgagaattataatattgttacagaaatagaaaaa 
tatgcctcagaagatcataaaaaagccattatttacaaggataacgagcatgaaaatatt 
tctgtaagttataaagaacttatcagtaatgctaataaagtagggaatgtattcctcaat 

40 catgggctaaaaaagggagataaagttctcatcatgatgccacgtgcaatcgttacatat 
gaattatatattgcagcattgaaactagggatagcgattgttccaagttcggaaatgtta 
cgaacaaaagatttacaatatcgaattactcacggtgagattgatgcagttatttcattt 
gattctctaactaaagaatttgaaaacgttaaagaatatgaccaattaaaaaaatttata 
gtagctggtcacaaagaagattgggtttcaatagaagatgaaaaagaaaaagtaagtgat 

45 gaccttaaaggcgcagatacaacacgagatgatttggcgattctttcttatacatcaggt 
acaacaggcaatccaaaagcagtaacgcattcacatggatgggggtatgcccatttacaa 
atggcaccaaaacattggttatgtatacaagagaatgatcttgtatgggcaactgcagca 
ccagggtggcaaaagtgggtgtggagtccatttttatctgtattagggatgggagcaaca 
gcatttgtctataacggtcgtttccaccctgaaacatatctcgagttacttcaaaattat 

50 caaattaatgttctatgttgtacaccaacagaatatcgtatgatggctaaacttagtcat 
ttagaacagtacaatttagagtatttacacagtgcggtgtctgcgggtgaacctttaaat 
cgagaagttgttgaacaatttaaacgtcattttaatattactgttcgagatggatatgga 
caaaccgaaagtacattgttgatcggatttctaaaagatactgaaccacgtatgggttct 
atgggcaaaggtatacctggtagttttgttactgtcattgacgatgatggtaaagaggtt 

55 ggtccaaatgttaaaggtaatatcgccgtgcctttagacttaccggctttatttaaaggt 
tactttaaagatgaagcacgcacaaaagcagcttcaacaggtgattattatgttactgga 
gaccaagctcatattgatatatatcagtggtcaatcatttttacgtcctttattttaaaa 
taa 
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Sequence 3318 

MNKShnL.LAPEKrraiVTEIEKYASEDHKJG\IIYKDNEHENISVSYKELISNANKVGNVFb^ 
HGLKKGDKVLIMMPRAIVTYELYIAALKLGIAIVPSSEMLRTKDLQYRITHGEIDAVISF 
DSLTKEFENVKEYDQLKKFIVAGHKEDWVS I EDEKEKVSDDLKGADTTRDDLAILSYTSG 
5 TTGNPKAVTHSHGWGYAHLQMAPKHWLCIQENDLWATAAPGWQKWVWSPFLSVLGMGAT 
AFVYNGRFHPETYLELLQNYQINVLCCTPTEYRMMAKLSHLEQYNLEYLHSAVSAGEPLN 
REWEQFKRHFNITVRDGYGQTESTLLIGFLKDTEPRMGSMGKGIPGSFVTVIDDDGKEV 
GPNVKGNIAVPLDLPALFKGYFKDEARTKAASTGDYYVTGDQAHIDIYQWSIIFTSFILK 
* 

10 

Sequence 3319 
Cont ig_0 810_pos_8 2 7_1 459 
is similar to (with p-value 8.0e-31) 
. >gp:gp|D64024|D64024_2 Sulf Globus sp. DNA for 2-oxoacid: f er 
15 redoxin oxidoreductase subunit alpha and beta, complete cds. 
NID: gl565182. 

atggcaaacaaagatttaacagttatcgcttctggtggtgatggagacggctatgcaata 
ggaatgggacatactattcatgctcttagacgtaatatgaatatgacgtatattgtcatg 
gacaatcaaatatatggattaactaaaggacaaacatcaccttcctcagctaaaggattt 

20 gtaactaaatcaacacctaaaggaaatatagaaaagaatgtagctccattggaattggca 
ctgtcctctggtgcaacttttgtagcacaaggattctcaagtgatataaaggcattaact 
aaaatgattgaagatgcgattcatcatgatggtttttcttttgttaatgttttctcacct 
tgtgttacttacaataaagtgaatacttatgactggtttaaagaacatttaacaagtatc 
gatgatattgagggctatgacatcacagataaacaacttgctatgaaaactgtgctggat 

25 catgagtcactggttaaaggtatcgtttatcaagatacaacaacaccttcttatgaatcg 
caaatttcagaactagaacatgaggcgttagctaaaagagatattcatattacagaagaa 
actttcaacgatttaactgcacaatttttataa 

Sequence 3320 

30 MANKDLTVIASGGDGDGYAIGMGHTIHALRRNMNMTYIVMDNQIYGLTKGQTSPSSAKGF 
VTKSTPKGNIEKNVAPLELALSSGATFVAQGFSSDIKALTKMIEDAIHHDGPSFVNVFSP 
CVTYNKVNTYDWFKEHLTSIDDIEGYDITDKQLAMKTVLDHESLVKGIVYQDTTTPSYES 
QISELEHEALAKRDIHITEETFNDLTAQFL* 

35 Sequence 3321 

Con t i g_0 8 1 2_pos_6 2 0_1 0 0 3 
is similar to (with p-value 3.0e-63) 
>pir :pir |A43577 |A43577 regulatory protein pfoR - Clostridiu 

m perfringens 

40 atgactgctaatgcatatgggttgttatcacctactgaagtgattgcactaccaatttgt 
ttaagtactgcaataacccctggagaaattaagtttgcaagaccaaatgtaactgctgga 
gccaataaaataacaacaattaaatctagtccctctggaacttttttctcaacaaacttg 
attccaaatgcaacaacatacgctgcgataaacgcaggtaataattttgagtcatgtaat 
actaaacctacaatgactgcaaatactggagacacttctaactttagacaagttaatata 

45 ccaacagctataccacttaaactccctgctagatcaccaatttcttggaaaaatttaaca 
tggaatacgccaccaattgcgtaa 

Sequence 3322 

MTANAYGLLSPTEVIALPICLSTAITPGEIKFARPNVTAGANKITTIKSSPSGTFFSTNL 
50 IPNATTYAAINAGNNFESCNTKPTMTANTGDTSNFRQVNIPTAIPLKLPARSPISWKNLT 
WNTPPIA* 

Sequence 3323 
Cont ig_0 8 1 2_pos_2 3 6 6_5 200 
55 is similar to (with p-value 2.0e-07) 

>sp:sp|034863|UVRA_BACSU EXCINUCLEASE ABC SUBUNIT A. >gp:gp 
|Z99122 |BSUB0019_13 Bacillus subtilis complete genome (secti 
on 19 of 21): from 3597091 to 3809700. NID: g2636029. >gp:gp 
|AF017113 |AF017113_13 Bacillus subtilis 300-304 degree genom 
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ic sequence. NID: g2618830. 

atgaaaggaccgtcaattgtagtaaaaggtgcacgagcacataatttaaaaggagtagat 

attgaattacctaaaaacaagttgattgtcatgactggactttcaggctcaggtaaatct 
tcgttagcgttcgatactatttatgctgaaggtcaacgacgttatgtggaatcgttaagt 
5 gcatatgcgcgacaatttttaggacaaatggacaaacctgatgtagataccattgaaggt 
ttgtcaccagcaatatctattgatcagaagacgacaagtaaaaacccacgttctaccgtt 
gccactgtgactgaaatctatgattatattcgattattatatgcgcgtgtcggtaaaccg 
tactgtccttatcacggtatagaaattgaatcacaaactgttcaacaaatggttgatcgt 
attttagaattggaagaacgtacaaaaattcaattacttgctccagttatatcacataga 

10 aaaggaagtcatgaaaaattaattgaagatataggtaagaaaggatatgtgcgtttacgt 
gttgatgatgaaatcgtcgatgtaaatgaagtgcctcaattggataaaaataaaaatcac 
acaattgaagttgttgttgatagacttgtagttaaagatggcattgagacgcgtttagcg 
gactctattgagacagcacttgaacttgctgaaggtaatttaacagttgacgttattaat 
ggagaagaacttaaattctctgagaaccacgcatgtcctatatgtggtttctctattggt 

15 gaattagaacctagaatgtttagtttcaacagtccgtttggtgcgtgtccaacttgtgat 
ggtttaggtcaaaaactaaaagtagacctagatttagtcattcctgataaaaataaaact 
ttaaatgaaggtgcaattgaaccatgggaaccaacaagttcggatttttatccaacttta 
ttaaaacgtgtatgtgaagtctacaaaattaatatggataagccatataaaaaattgact 
gatagacaaaaaaatatacttatgaatggttctggagaaaaagaaattgaatttactttc 

20 acgcaacgaaatggtggtactcgtaaacgtaaaatggtttttgaaggtgtggtacctaac 
attgaccgtcgttaccatgaatctccctctgaatatacacgcgaaatgatgagtaaatat 
atgacagagttaccgtgtgaaacttgtcatggcaaacgcttaagcaaagaagcgttatct 
gtatacgtaggcgactataatataggtgaagtcgttgaatattcaattaaaaacgcgctt 
tattatttcgaaaatttaaaattaagcgaccaagataaatcgattgcagatcaaatttta 

25 aaagagattatttcaagattatcatttttaaataatgttggtttggaatatttgacttta 
gatcgctcatcagggactttatctggaggggaagcacaacgtattcgtttagcaacacaa 
attggttcacgattaaccggagtactatatgttttagatgaaccttctattggattacat 
caaagagataatgatagattaattaatactttaaaagaaatgcgtgatttaggtaataca 
cttattgtcgttgaacatgacgatgatactatgagagcagctgattatttagttgatgtg 

30 ggtccgggagctggtaaccacggtggagaggttgtctcaagtggtacccctaataaagta 
atgaaagataaaaaatccttaactggtcaatatttaagtggaaaaaaacgaattgaagtc 
cctgaatacagacgagaaatcaccgatagaaagattcaaattaaaggtgctaaaagtaat 
aatttgaaaaatgtaaatgtagacttcccactatctgtcttaactgttgttacaggtgtg 
tcaggttctggtaaaagttcactcgtcaatgaaattttatataaagcattagctcaaaaa 

35 attaataaatctaaagtgaaacctgggaattttgatgaaattaaaggaattgatcaatta 
gataaaatcattgatattgatcaatcgccaataggtagaacaccacgttctaacccagcc 
acatacactggtgtctttgatgacataagagatgtctttgcacaaacgaatgaagctaaa 
atacgaggttatcaaaaaggtagatttagttttaatgtcaaaggtggacgatgtgaagct 
tgtaaaggtgatggaattataaaaattgaaatgcattttttaccagatgtctatgtacct 

40 tgtgaagtatgtgatggtaaacgctataatcgtgagactttagaggtaacatacaaaggt 
aaaaatattgcggatgtattagaaatgactgttgaagaagctacgcatttctttgaaaat 
attcctaagattaaacgtaaattacaaacacttgtagatgttgggttggggtacattact 
ttaggtcaacaagctactacattatctggtggcgaagcgcaacgtgtaaaactcgcatca 
gaattgcacaaacgttcaacggggcgttctatttatattcttgatgaaccaactacagga 

45 ttacatgtcgacgatataagtcgtttattaaaggtattgaatcgtatagtggaaaatggt 
gatacggtcgttattatcgaacacaatcttgatgttattaaaacggctgatcatattatt 
gatttaggtccagaaggcggtgaaggtggaggaacaatcatcgcaactggtacacctgaa 
gagattgctcaaaataaagggtcttacactggtcaatacttaaaaccagtattagagaga 
gacagcgttgaatag 

50 

Sequence 3324 

MKGPSIWKGARAHNLKGVDIELPKNKLIVMTGLSGSGKSSLAFDTIYAEGQRRYVESLS 
AYARQFLGQMDKPDVDTIEGLSPAISIDQKTTSKNPRSTVATVTEIYDYIRLLYARVGKP 
YCPYHGIEIESQTVQQMVDRILELEERTKIQLLAPVISHRKGSHEKLIEDIGKKGYVRLR 
55 VDDEIVDVNEVPQLDKNKNHTIEVVVDRLV\naX5IETRI^SIETALELAEGNLTVDV 

GEELKFSENHACPICGFSIGELEPRMFSFNSPFGACPTCDGLGQKLKVDLDLVIPDKNKT 
LNEGAIEPWEPTSSDFYPTLLKRVCEVYKINMDKPYKKLTDRQKNILMNGSGEKEIEFTF 
TQRNGGTRKRKMVFEGWPNIDRRYHESPSEYTREMMSKYMTELPCETCHGKRLSKEALS 
VYVGDYNIGEWEYSIKNALYYFENLKLSDQDKSIADQILKEIISRLSFLNNVGLEYLTL 
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DRSSGTLSGGEAQRIRLATQIGSRLTGVLYVLDEPSIGLHQRDNDRLINTLKEMRDLGNT 
LIWEHDDDTMRAADYLVDVGPGAGNHGGEWSSGTPNKVMKDKKSLTGQYLSGKKRIEV 
PEYRREITDRKIQIKGAKSNNLKNVNVDFPLSVLTVVTGVSGSGKSSLVNEILYKALAQK 
INKSKVKPGNFDBIKGIDQLDKIIDIDQSPIGRTPRSNPATYTGVFDDIRDVFAQTNEAK 
5 IRGYQKGRFSFNVKGGRCEACKGDGIIKIEMHFLPDVYVPCEVCDGKRYNRETLEVTYKG 
KNIADVLEMTVEEATHFFENIPKIKRKLQTLVDVGLGYITLGQQATTLSGGEAQRVKLAS 
ELHKRSTGRSIYILDEPTTGLHVDDISRLLKVLNRI\^GiyiVVIIEHNLDVIKTADH 
DLGPEGGEGGGTI lATGTPEEI AQNKGSYTGQYLKPVLERDSVE * 

10 Sequence 3325 

Contig_0812_pos_1152_109 

is similar to (with p-value 2,0e-22) 

>pir ;pir I A43577 I A43577 regulatory protein pfoR - Clostridia 
m perfringens 

15 atggatattattttaggagtagggactttagtactcgttcttattatcatgacgcttttc 
ttaaattttgcgccatatggtaaacaaggtttacaagctttatcaggggctgcttgtgcc 
acgtttttaccacaggcgttcttaagttacgcaattggtggcgtattccatgttaaattt 
ttccaagaaattggtgatctagcagggagtttaagtggtatagctgttggtatattaact 
tgtctaaagttagaagtgtctccagtatttgcagtcattgtaggtttagtattacatgac 

20 tcaaaattattacctgcgtttatcgcagcgtatgttgttgcatttggaatcaagtttgtt 
gagaaaaaagttccagagggactagatttaattgttgttattttattggctccagcagtt 
acatttggtcttgcaaacttaatttctccaggggttattgcagtacttaaacaaattggt 
agtgcaatcacttcagtaggtgataacaacccatatgcattagcagtcattttaggactt 
gttattcctgtaactggtatgacgccattaagctcaatggtacttacaagcttattaggt 

25 ttaactggtattccaatggcaattggtgcattaacatgtacaggagcatcttttgttaat 
ggaatcttatttagcaaattaaaaattggtaataaaggtaatgccttcgcggtatttgta 
gaaccgttaactcaaattgacttaattgccaaatatccactacaactgtttggtgcgaat 
gccattattggtgttgtaaatgcttgtattgtcacatacagtggactaattattgatatt 
aaaggtatggcaacacctatagcaggtgctattgtactttatggctttaacgacgctgta 

30 agate tacaattacaattatcgcagtagcaattgcaagtgtgatattagcgtacgttatt 
agtgctattattaataaatttaacttgatgaatgtcggattcaagttaccacgtagaaaa 
aaccaagttaaggagagtgtttaa 

Sequence 3 326 

35 MDIILGVGTLVLVLIIMTLFLNFAPYGKQGLQALSGAACATFLPQAFLSYAIGGVFHVKF 
FQE IGDLAGSLSG I AVG I LTCLKLEVSPVFAVIVGLVLHDSKLLPAFI AAYWAFG I KFV 
EKKVPEGLDLIWILLAPAVTFGLANLISPGVIAVLKQIGSAITSVGDNNPYALAVILGL 
VIPVTGMTPLSSMVLTSLLGLTGIPMAIGALTCTGASFVN6ILFSKLKIGNKGNAFAVFV 
EPLTQIDLIAKYPLQLPGANAIIGWNACIVTYSGLIIDIKGMATPIAGAIVLYGFNDAV 

40 RSTITIIAVAIASVILAYVISAIINKFNLMNVGFKLPRRKNQVKESV* 

Sequence 3327 
Contig_0814_pos_2358_4124 
>sp : sp 1 067589 I SYD_AQUAE ASPARTYL-TRNA SYNTHETASE (EC 6.1.1. 
45 12) (ASPARTATE—TRNA LIGASE) (ASPRS) • >gp:gp| AE000750 1 AE0007 
50_9 Aquifex aeolicus section 82 of 109 of the complete geno 
me. NID: g2983999. 

atgaataaaagaacaacgtattgtggtttagtcacagaagaatttttaaacgaaaaagta 
acattaaaaggttgggttcataacagacgagatttaggtggattaatttttgttgattta 

50 agagatcgtggaggtattgtccaaatagtttttaatcctgacttttccgaagaagcattg 
caagttgctgaaacagtacgctcagaatatgtagtcgaagttgaaggtgtagtaacaaaa 
cgtgatgctgaaactattaacccaaaaatcaaaacaggtcaagttgaggttcaagtttca 
aatattgagattattaacaaatcagaaacacctccattttcaattaatgaagaaaatgta 
aacgttgatgaaaatattcgattaaaatatagatatttagatttacgtagacaagaatta 

55 gcgcaaacttttaaaatgagacatcaaactacgcgttctatccgtcaatacttagataat 
aatggcttcttcgatattgaaacaccagtattaacaaaatcaacacctgagggtgcgcga 
gattatctagtaccttcccgtgtacacgagggtgaattttacgcgttgccacaatcacca 
caattatttaaacaactattaatgataagtggttttgataaatattatcaaattgttaaa 
tgtttccgtgatgaagacttacgtgcagatcgtcaaccagaattcactcaagttgatatt 
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gaaacgagttttgtcgatcaagaagatatcatagccatgggtgaagatatgttacgtaag 
gttgtaaaagatgtaaaaggaatagacgttagtggcccattcccacgtatgacatatgca 
gaggctatggaccgttttggttcagataaacctgacactcgtttcggtatggaacttatc 
aatgtgtcacagcttggtaaagaaatgaattttaaagtttttaaagatacggtagataac 
5 aacggcgaaattaaagcaattgtcgcaaaagacgctgcaaataaatatacacgtaaagac 
atggatgcattaacagagtttgtaaatatatatggtgcaaaaggattagcttgggttaaa 
gttgttgatgatggtttaagtggcccaattgctagatttttcgaagatgttaatgttgaa 
acacttaaacagttaacagaagctaaacctggagatttagtaatgtttgtagctgataaa 
cctaatgttgttgctcaaagtttaggggctttaagaattaaattagcaaaagaattaggt 

10 ttaattgatgaatcaaaattaaatttcttatgggtaactgattggccgttattagagtat 
gatgaagacgcaaaacgttatgtagcagcacatcatccatttacttcacctaaaagagaa 
gatatcgaaaagctagacactgaacctgaaaatgtacaagccaacgcttatgatattgtt 
ctaaatggttatgaacttggtggtggttctataagaatacacgatggtgaattgcaacaa 
aaaatgtttgaagtattaggatttactaatgaacaagctcaagaacaatttggtttctta 

15 ttagatgcttttaaatacggtgctccacctcatggtggcatcgcgttaggtttagataga 
cttgtgatgttattaacaaatagaacaaacttgagagatacaattgcattccctaaaaca 
gcatcagctacatgtcttttaactgacgctccaggagaagtatctgataaacaactccaa 
gaactctcactaagaatcagacactag 

20 Sequence 3328 

MNKRTTYCGLVTEEFLNEKVTLKGWVHNRRDLGGLIFVDLRDRGGIVQIVFNPDFSEEAL 
QVAETVRS EYWEVEGVVTKRDAET INPKI KTGQVEVQVSNI EI INKS ETPPFS INEENV 
NVDENIRLKYRYLDLRRQELAQTFKMRHQTTRSIRQYLDNNGFFDIETPVLTKSTPEGAR 
DYLVPSRVHEGEFYALPQSPQLFKQLLMISGFDKYYQIVKCFRDEDLRADRQPEFTQVDI 

25 EMSFVDQEDI lAMGEDMLRKWKDVKGIDVSGPFPRMTYAEAMDRFGSDKPDTRFGMELI 
NVSQLGKEMNFKVFKDTVDNNGEIKAIVAKDAANKYTRKDMDALTEFVNIYGAKGLAWVK 
WDDGLSGPIARFFEDVNVETLKQLTEAKPGDLVMFVADKPNWAQSLGALRIKLAKELG 
LIDESKLNFLWVTDWPLLEYDEDAKRYVAAHHPFTSPKREDIEKLDTEPENVQANAYDIV 
LNGYELGGGSIRIHDGELQQKMFEVLGFTNEQAQEQFGFLLDAFKYGAPPHGGIALGLDR 

30 LVMLLTNRTNLRDTIAFPKTASATCLLTDAPGEVSDKQLQELSLRIRH * 

Sequence 3329 

Con t ig„0 8 1 4__pos_4 6 8 0_5 4 5 6 

is similar to (with p-value 6.0e-34) 

35 >sp:sp|Q57097 I YGDL_HAEIN HYPOTHETICAL PROTEIN HI0118 . >pir: 
pir |C64049 |C64049 molybdopter in biosynthesis protein (chlN) 
horaolog - Haemophilus influenzae (strain Rd KW20) >gp:gp|U32 
698|U32698_1 Haemophilus influenzae Rd section 13 of 163 of 
the complete genome. NID: g3212178. 

40 atgaaacatcaattttcaaggaatgaattagcaataggacaagaagggctgaacttacta 
aaaaataagactgttgcagttttaggtgttggtggcgtcgggtcatttgcagctgaggca 
ttggctcggactaatatagggcacatcatacttatagataaagatgatgtcgatattaca 
aatgtgaacaggcaaattcatgcactgacttcaactattggtcaaagtaaagtcacgcta 
atggaagaaagaatcaaattaataaatcccgattgtaaagtaacttctttgcatatgttt 

45 tataccgaggaaacatacaaagatatcttcaataattatgatattgattattttattgat 
gcaagcgatacaatcatttataaagttcatctcatgaaagagtgtttagaaagaggaatt 
gagttaatttcaagtatgggtgcagcaaataagactgacccgacacgttttgaaattgca 
gatatttcaaaaacacatactgatcctatggctaaagtaattagaaatcgtttaaaacgc 
cttggtattcgtaaaggtgttaaagtagtattttctgatgaaagtcctattgttattcgc 

50 gaggacgtaaaagaaacagtaggagataaaaatgcaatcaatagaaaagggcaaatgcct 
ccatcttctaatgcatttgttccaagtgtagtaggccttatttgtgcaagctacgttgtc 
aacgatattttaaaagatatacctgtaaggcgaattaaagataaaggacaaaattaa 

Sequence 333 0 

55 MKHQFSRNELAIGQEGLNLLKNKTVAVLGVGGVGSFAAEALARTNIGHIILIDFCDDVDIT 
NVNRQIHALTSTIGQSKVTLMEERIKLINPDCKVTSLHMFYTEETYKDIFNNYDIDYFID 
ASDTIIYKVHLMKECLERGIELISSMGAANKTDPTRFEIADISKTHTDPMAKVIRNRLKR 
LGIRKGVKWFSDESPIVIREDVKETVGDKNAINRKGQMPPSSNAFVPSWGLICASYW 
NDI LKDI PVRRI KDKGQN* 
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Sequence 3331 
Con t ig_0 8 1 5_pos_2 1 0 0_9 4 9 
is similar to (with p-value 2.0e-49) 
5 >gp:gp|U59234 |SPU59234_4 Synechococcus PCC7942 biotin carbo 
xylase (accC) gene, complete cds. NID: g2661137. 
atgcgtgcagaaaatgaattagtagaacaaaaacaacaagaaaaacaaaaagcattgtat 
aaacaagagttagcatggatgcgagcaggagcaaaggcaagaactactaaacaacaggca 
cgtatcaatagatttaatcaactagaatcagacgttaagacgcaacatacacaagataag 

10 ggtgaacttaatcttgcatattcaaggttaggtaaacaagtatatgaattaaagaattta 
tcaaaatcaattaataataaagttttatttgaagatgtcactgaaattattcaaagtggt 
agacgtataggtattgtaggacctaatggagcgggaaaaacaacattacttaatattcta 
agtaatgaagatcaggactatgagggtgagcttaaaatcggtcagactgttaaggtagct 
tattttaagcaaacagaaaagacacttgaccgtgatattagagtgattgactacctaaga 

15 gaagaaagtgaaatggctaaagaaaaagatggtacctcaatttcagttacacaattgtta 
gaaagatttttatttccgagcgctacacacggtaaaaaagtttataaactctcaggtgga 
gaacaaaaacgtctgtatttattgcgtttacttgttcataaacctaatgtactcctttta 
gatgaaccgactaatgatttagatactgaaacacttacgattttagaagattacattgat 
gatttcggtggttctgtcattacggtcagtcatgatcgttatttcttaaataaagtggta 

20 caagaatattggtttattcatgatggtaaaatcgaaaaaattattggatcatttgaagat 
tatgaatcttttaaaaaggaacatgaacgccaagccatgctatctaaacaaactgaacaa 
caaaataaacataagcatcaaccaaaaaagaaaacaggactatcttataaagagaagtta 
gaatacgaaacaattatgacgcgtatagaaatgactgaaacgcgtttagaagaccttgaa 
caagaaatgattaatgcaagtgataattatgcaagaatcaaagaacttaatgaggaaaaa 

25 gagcaacttgaagcaacctatgaagcagacatcacgagatggagtgagcttgaggaaatt 
aaagaacaataa 

Sequence 3332 

MRAENELVEQKQQEKQKALYKQELAWMRAGAKARTTKQQARINRFNQLESDVKTQHTQDK 
30 GELNLAYSRLGKQVYELKNLSKSINNKVLFEDVTEIIQSGRRIGIVGPNGAGKTTLLNIL 

SNEDQDYEGELKTGQTVKVAYFKQTEKTLDRDIRVIDYLREESEMAKEKDGTSISVTQLL 
ERFLFPSATHGKKVYKLSGGEQKRLYLLRLLVHKPNVLLLDEPTNDLDTETLTILEDYID 
DFGGSVITVSHDRYFLNKWQEYWFIHDGKIEKIIGSFEDYESFKKEHERQAMLSKQTEQ 
QNKHKHQPKKKTGLSYKEKLEYETIMTRIEMTETRLEDLEQEMINASDNYARIKELNEEK 
35 EQLEATYEADITRWSELEEIKEQ* 

Sequence 3333 

Contig_0815_pos_935_52 8 

is similar to (with p-value l-0e~32) 

40 >pir :pir |S30712 I S30712 DNA helicase - Escherichia coli 

atgcaagagactttatcgcattattttggttataagtcatttcgaccaggacaagaagaa 
attataactaaaatattaaatcatcagcatacactgggggtactacctaccggtggcggt 
aaatcgatatgctatcaagtaccaggtttaatgcagggtggcacaactattgttataagt 
ccacttatttctttgatgaaagaccaagttgatcaactacaagcaatgggaattcaagct 

45 gcatatttgaatagtagtttgactcataaacaacaaaaagagattgaagagcaaataaag 
cgtggtgccattcagtttttatatgtagctccagagcgctttgaaaatactttttttcta 
aatttattacgtaaaatagaaataccccttatcgcttttgagcaataa 



Sequence 3334 

50 MQETLSHYFGYKSFRPGQEEI ITKILNHQHTLGVLPTGGGKSICYQVPGLMQGGTTI VI S 
PLISLMKDQVDQLQAMGIQAAYLNSSLTHKQQKEIEEQIKRGAIQFLYVAPERFENTFFL 
NLLRKIEIPLIAFEQ* 



PU3480 

55 

1 

SEQUENCE LISTING 



Sequence 3335 
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step . lOOOblO . cons . ok 

TCATACCCATAATGGCAGCACCTTGCATTAATGCGTTAGCAACATTGTTAC 
GTCCATCGCCAACATAAGTAAAGTTGATATCAGCATACTCTTTTTTCAATA 
5 CTTCTTTAGCAGTTAAAAAGTCAGCAAGCACTTGTGTAGGGTGATCTTCAT 
CGGTTAATCCATTCCATACCGGAACACCTGAATATTGCGCTAATGTTTCTA 
CAGTACGTTGAGAGAAACCTCGGTACTCAATACCATCATACATACCACCAA 
GTACACGTGCTGTATCTTTAGCAGTTTCTTTTTTACCCATTTGAGAACCTG 
TAGGTCCAAGATAAGTGACGTGTGCACCTTGATCATGTGCGGCAACTTCGA 

10 ATGCGCAACGTGTTCTAGTGGAATCTTTTTCAAAAAGAAGCGCGATATTTT 
TATTTTTTAGCATAGGCTTTTCAGTGCCGATATATTTGGCACGCTTCAAAT 
CTTCGGAGAGTGTTAATAAAAATTCTACCTCTTGTCGTGAAAAGTCTAATA 
AAGTTAAAAAGCTTCTGTTACGTAAATTTTTCATCAATAACAACTCCTTGT 
GAATGAATTACAATTTTATTGTAAGCGTTTCCTTGTAGTTTTCAACTTAAC 

1 5 TCAAAAATTGTAATTTTTTATTAATAAATCAATGAACTTTTTATTAAC ATT 
GAATTTGAATAGTGTGAAATTTGTCAAAAAAGGTTTATGTTTAAGTTAAAA 
CCAACTTTAAAGAATTTAGAAAATTCAATTAAAGTGAATTTAACACATTTT 
AAATAATGAAGGTAGATTAGAATCAAAATGTAAGAGGTATGTTAGTTGAAT 
AAACCGTATTAACTTCTTTATATTATTAGTGAATTCTCAAAATACTTGTAA 

20 AATTATATTGATTAATTACTTGTTCTTTATTATATTTAGAAACATCTCTTA 
CAATAAACCACAACTTACATGAAATTTAAGCTAACAGTGGATTATTCTAAT 
TAAATTGAAGGTCATTCAAAGTAAAAAGGGCATATTAGTATAATAGGAGGA 
ACAGTAAGATGAAAGTGTATAAGTTGAATTATCAACACCATAAAGATATTG 
TTGATGATAATGTATTAACGATGTTCGTAACAGCTGATAATCAAGATGAAG 

25 TTGAAGCGTTTGCGAAAAAATTACATTACAAGATTGAACATTTATCTCCAT 
TAACTAAAAAGGAATTTGAAGATGAGAAAGCGAAAGATTCACACTATAGAC 
TTGAACACGTGGATCACTATTTAAATTAATTATTCAATCATATATCAGAAC 
ATCCTGTGATCATTCGTTTGGTTACAGGTTTTTTACGCATGAGATTAAATG 
TTAATGTCACTAATACAAGGAGTGTATAAATGGAAATAACAAATGTTAATC 

30 ATATTTGTTTTTCAGTGAGTGATTTAAATACCTCTATACAATTTTATAAAG 
ATATTTTACATGGTGACTTATTAGTATCAGATAGAACGACAGCATATTTAA 
CTATTGGTCATACTTGGATTGCACfGAATCTAGAAAAAAATATACCAAGGA 
ATGAAATAAGTCATTCCTATACGCACGTTGCTTTCTCCATAGATGAAGAAG 
ATTTTCAACAGTGGATTCAATGGCTTAAAGAGAATCAAGTAAATATTTTAA 

35 AAGGGCGACCAAGAGACATTAAAGACAAAAAATCGATATATTTTACAGATC 
TGGATGGGCATAAAATTGAATTACATACTGGAACATTAAAAGATAGAATGG 
AATATTATAAATGTGAGAAGACGCATATGCAATTTTACGATGAGTTTTGAT 
ATTTTAATTGATATGTAATTTTTTGAAAAAATTTATGATTCATTTTGTAAT 
AA/^TTACTATATACTAAAATTTGTAATTTTCAATTATTTTAATGAGGTGA 

40 ATCAATGAATAAATTATCAAAGTACATTGCAATAGCTACATTAGCATCGAC 
AGTTACAATTTCAGCACCAATTCATACATATGCATGTGAATCTCATACTAA 
AGATAACCATAATCAAACGACACAACATCAAAATGACCCCAACCTTGGTGA 
ACAAAATGTAATGGCTGTCTCATGGTATCAAAATTCTGCCGAAGCGAAGGC 
ACTTTATCTACAGGGATACAATACTGCAAAATATCAGTTAGATGAACATAT 

45 TAAAAAGGATAAAGGTAAGAAAAAACTTGCTATAGCTTTAGACTTAGATGA 
AACAGTTCTTGATAATTCACCATATCAAGGATATGCTTCTATGCACGATAC 
GTCTTTTCCAGAAGGATGGCATGAATGGGTTGCTGCAGCGAAGGCAAAACC 
TGTTTATGGCGCCAAATCATTCTTAAAATATGCTGACAAAAAAGGTATCGA 
TATTTACTATATTTCGGACCGTGATAAGGAAAAAGATTTTAAAGCTACAAA 

50 GGAAAATTTAAAAAATATTGGACTACCGCAAGCGAAAGATAATCATATTTT 
ACTAAAAGGGAAAAACGATAAAAGTAAAGCATCGCGACGTCAACAAGTCGA 
AAAAAATCATAAGTTAGTGATGTTATTTGGTGACAATTTGTTAGATTTTAC 
AGATCCTAAAAAATCTACTGCTAAAGAACGTGAGAAACTCGTACAGAAGCA 
TGCAAAAGATTTTGGCAAAAAGTATATTATTTTCCCAAATCCAATGTATGG 

55 AAGTTGGGAATCGACACTTTATCATAATCAATACGAAATAAGTAAGAATGA 
AAAGGATGAACTACGAAAGTCATCAATTAAACAATTTAATCCTAAAACGGG 
TGAAGTAAAATAATTCCTGACATTAAATACTAAATTTCTATTAGTAAAAGA 
AAGTTTTACACATAATTATTGTATTGTGTGCAGACACTTTAATAATTAATA 
AAGTTATATTTACATGGAGTACTTTAATTCCGTGTTATACCATTGCGAGGA 
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TGAAACAATGTACGCGTTGTTCCATCCTTTTTTTTATAAATATTTTCAATC 
GTATCTAGATTAATGTAAACTTATAATATCGAAAAGTTTACAATGTGCAAT 
CGAATGATTTGAAAATGCT7UVTTTTTAAGAAAAAGGAGTTAAAGATTATGA 
CATTAAACCTAGCTCAACGTGTGTTAAATCAAGAGTCATTAACAAAAGATG 
5 AAGCAATATCTATTTTCGAAAATGCTGAAATTGATACATTTGATTTATTAA 
ATGAAGCCTACACAGTGAGAAAACATTACTATGGTAAAAAAGTTAAGCTTA 
ATATGATATTAAATGCTAAAAGTGGTATCTGTGCAGAAGATTGTGGGTACT 
GTGGGCAATCTGTAAAAATGAAAGAAAAGCAACGTTATGCACTTGTTGAAC 
AGGACCAAATTAAAGAAGGCGCTCAAGTGGCAACTGAAAATCAAATCGGTA 
1 0 C ATACTGTATTGTTATGAGTGGTAGAGGTCCTAGTAACAGAGAAGTCG ATC 
ATATTTGCGAAACAGTAGAAGATATTAAAAAGATACACCCACAACTAAAGA 
TTTGTGCGTGCTTAGGATTAACGAAAGAAGAACAGGCTAAAAAATTAAAGG 
CTGCTGGTGTCGATCGTTATAATCATAATTTAAATACGAGTGAGCGTTATC 
ACGATGAAGTAGTAACTACACATACATATGAGGATAGAGTGAATACGGT 

15 

Sequence 3 336 

step, lOOOcOl . cons . ok 

TAACATCAGTGTTCAGTAAAAATTGGCTTCATTACTGGAGAAATGAATGTG 

20 GTTATACTATAAAACCGGTTTCTGAGACAAATGACGTTCGGATATATGATC 
AACAACAATTTATAATGTATGCGCATTTTGCTGATGAAGCTTATCAAAAAA 
TTGATTACATTAATTATTTTGACACATCTAGAAGAA7VAATAAAAAGAGAGT 
TATACGATACACGCGGATTTTTAAGTTGTGTTAGAGTCTTATCAACGGACC 
AAAAAATACAAGCGGAGTACTATCTTTCTCCTCAAGGAAACGTGAAAATTG 

25 AAAAGTATTATGATATTAATTCAAATCATCCCTATGAGGCAAAAAAGATTA 
TATTAAATCATTTAGGTAAAACGCACTTTTTAAACAATGAAACTGAGCTAG 
GTGCTTTTTTTATTGAAACAATATATCAAAATAGTGACTTATTCTTTAGCG 
ATCGTAACCTTGTTACATCACATATATTTAACATAGTTGCATCTTACATCC 
CTGTTATTGCAGTGTTACATAGCACTCATGTTAAAGACGTTACTGATTTAG 

30 CGCATTCACCAATCAAAAATGTTTATAAAGGGGTTTTTGAACATTTACAAA 
GGTATAAAGCCATCATTGTTTCAACCCAACAACAAAAAGCTGACGTAATTG 
AGAGAATCAGTGGAGTTATTCCAGTATATGCTATTCCAGTTGGCTATTCAT 
CTATCGATATGAAAGATTACTCTAATGAAAATAAATATGTTTCACCTAAGA 
AAATTATTTCTGTCGCAAGATACTCACCTGAAAAGCAATTGATACAACAAA 

35 TCGAACTTATTAATAAGCTTAAAGATTCATTTCCAAATATCGAACTACATA 
TGTATGGTTTTGGTAAAGAAGAACAACATTTAAAAGAACGAATTCAAGAAT 
TAGGATTAGAAAAGCACGTAATATTAAGAGGTTTTTTAAAAGATTTAACTG 
ATGAATATCAAGATGCATATCTAAATTTAATAACAAGTAATATGGAAGGCT 
TTTCCTTAGCATTATTAGAATGTGAGTCACATGGTGTACCTTCTATCAGTT 

40 ATGATATTCAATATGGTCCAGGTGAATTGATACAGGATGGTAAGAATGGAT 
ATTTAGTAGAAAAGAATAATCAACACATGCTATTTGAAAAAGTAAAATTAT 
TGCTCAACAACCCTCAACTTCAACAGTCTTTTTCTCACCATTGTATTGAAA 
CTGCACAAAAGTATTCTCGAACACAAATCATGTTACTGTGGAAAAATTTAC 
TTCAACATTTTAACTAAAGCATAAAAATATAAGTCGTCAAATTTTGCTATG 

45 AAGTCATTTATCATTAAATGGTTTCATAGTATTTTTTTGTATTAAATAAAA 
GAATTTTTATTAATTCAATAAAAAGGAACCATGAATTCATTTAAAATTCAC 
GGTTCCTCTATTAAATCATTGTCAGATTGTGAGTTGTATATGTTTCTCTAA 
AAATCATACATTTAAGTCATTTTTAATAGAAATGACCTTGTTTGCATCATC 
TTTTTCACAAGATATGATAAACGTGAAAGTTATATTATAGTTTTAATCAAC 

50 ATCTTACATTTGTTGGACTAAATCTACTTTCTATTCAAATCTCATATCTGC 
TTCTTTCTACAAATTTCATTTGAACATOTTAATTTTTATTTTTTCTATTTT 
TGCGACGTTTCCCTAATAATAACGCTCCTAAACCTGCAAACAGAGTTCCAA 
GTAACGTGCCTTTAGAGCCATAATCTTCATTAGCTCCTGTATCAGGTAATT 
TATCTTTTGTACTCTTATCTGAGCTATTGCCTAAATCTGAGTCGTTGTCTG 

55 AGTCTGAATCACTATCGGAATCCGAATCACTGCCTGAGTCTGAATCACTAT 
CAGAATCTGAGTCACTATCCGAGTCTGAGTCACTATCCGAGTCTGAGTCGC 
TATCGGAGTCTGAATCACTATCGGAATCTGAATCACTGTCTGAGTCTGAAT 
CACTATCAGAATCTGAGTCACTGTCTGAGTCTGAATCACTATCGGAATCTG 
AGTCGCTGTCTGAGTCTGAATCACTATCGGAATCTGAGTCGCTGTCTGAAT 
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CTGAATCACTGTCTGAATCTGAATCACTGTCTGAGTCTGAGTCGCTGTCTG 
AATCCGAGTCGCTATCAGAATCCGAGTCGCTGTCTGAATCTGAGTCGCTGT 
CTGAATCCGAGTCGCTATCGGAGTCTGAATCACTATCGGAATCTGAGTCGC 
TGTCTGAGTCACTATCGGAATCCGATTCGTCATCATAGTATCCGTTATCTA 
5 TACTAAAGTCATCATGATCAGTAATTGTTACATGAACTTCTTCCCCATCAG 
CATCCTGTTCGTCATCATCACCAGAATCTGTTGTTGTTTGAGTCATACCTG 
AAGGTTTATCAAAATGAACAATATAATTACCACTATTTAAATTATCAAATT 
GATACTTTCCATTTTCATCGGTTGTAGTTGTACTAATGATATTTCCGTTTT 
CATCTTTTAACGTCACTTTAACTCCAGAGATTCCTTTTTCATCATCACCTT 

1 0 GAATACCATCTTTATTAGTGTC ATACCATACATAGTTCCCTAAGCTGTATT 
TAGGTGTTTGATGGGGAAAACAATTATGTTAAGTATTAAAAACTTAACCAA 
GATTTATTCAGGGAATAAAAAAGCGGTAGATAATATTTCTTTGGATATTCA 
ATCTGGTGAATTTATCGCTTTTATTGGGACGAGTGGTAGTGGTAAAACAAC 
TGCACTACGCATGATTAATCGTATGATTGAGGCGACAGATGGACAGATTAT 

1 5 GATGAATGGAAAAGATGTCCGTAATATGAATCCTGTTGAATTGCGGAGAAG 
TATCGGTTATGTCATTCAACAAATTGGTTTGATGCCACATATGACTATTCG 
AGAAAATATTGTTTTAGTACCTAAACTTTTAAAATGGTCTAAAGAGAAGAA 
AGATGAAAAAGCTAAAGAACTTATTAAACTGGTAGATTTACCTGAAGAATA 
TTTGGATCGTTATCCAGCTGAATTGTCAGGAGGGCAACAACAACGAATTGG 

20 TGTTGTGCGCGCTTTAGCAGCTGAACAAGATATTATATTAATGGATGAACC 
TTTCGGTGCATTAGATCCTATTACACGCGATACATTACAAGATTTAGTAAA 
GGAATTACAACAAAAATTAGGAAAAACATTTATTTTTGTCACTCATGATAT 
GGATGAGGCTATTAAATTAGCAGACAAAATATG 

25 

Sequence 3 337 

step . 1000d09 . cons . ok 

AAACAGAAACGCGAGCTTCAATTAAAACCTCATCACCTTCTTTAGGCTCAA 
AGCCTAACTTAGAAGCATTTCCTTTAAACATCATTCCACTGATGACACTTT' 

30 CTTTGTCCTTGACATTAAAATATAAATGTCCGCTTGAATGCTTTTTAAAGT 
TAGATAATTCTCCTTTAATAAGAACTGACTGTAGATGAGGATCTTGGTCAA 
ATTTATATTTTATATATTTAGTTAGTGCAGAAACACTTAAATATTCAGTCA 
TAGTTATCACTCATTTTAATCATTTATATTACTTAAAACACCATTAACAAA 
TTTATAATGATCATCATCACTAAACTGTTTTGTGAGTTCTACAGCTTCATT 

35 AACAACTACTTTTTTAGGTGTGTCGCTGTGCAATATTTCAAAAGTTGCCAT 
TCTTAAAATAATACGATCTGATTTCAGTAAACGATCGATAGACCAGTCTTT 
TAAATGGGGTTTAATTGTTTCGTCTAAAACGATTTGATGATCTTTGACTCC 
AGTAACTAACCAGTATATAAAATCAAAGTCTAAATCAGAATGATCATCTTT 
AATAAATTCAATTGCTTCTTGAATTGTTAAATCTGTCTCTTTTATTTCAAG 

40 TTGAAATAAAGTTTGAAAAGCTTGTACTCTTGCATCTTTACGACTCATATT 
TTACTCCTTCAAATGTATTTATAAAATTATAGGTACAACTTTCTAGTAAAG 
TGCACCTATAATTTAAATTATTTATTTTTCTGCGACGATACTTCTGATGTG 
AATATTAATTTGCTGTGGTTCAATAGCTGTCATTGTCGTAATTGAATTGAA 
AATTGCTTCTTGAATTTGATTTGCAGTTTTAGAAATATTTACGCCATGTTT 

45 TAAAGAACAAAATACATCAATGTATATTCCGTCTTCTTTAGCTTCGATTTT 
TAAATCTCTGTTTAAATTTTTTCGACTAATCTTCTCTAGATTTGTTTTTTT 
TAGTTCAGCAAAATGGCCTGTAATACCTTCTACTTCTGATGTCGCAATGGA 
TGCG ATAAC AGATAATACTTC TGGTG CTATTTCAATTTTTCCTAAATTAG A 
TTGAGAATAATCTGCTACATTGACCATTCTTTATCCTCCTATTGGTTATTA 

50 TCGTCCATAATATTATACTTTTCTAGGAATTTTGTATTAAATTCCCCACTT 
CTAAAAATATGATTATTTAGAAGTCTTAAGTGGAATGGAATCGTAGTGTCG 
ATACCTAAAACAAGATATTCACTTAAAGCACGAATGCCTGTCATAATTGAT 
TCTTCACGTGTAGGTTCGTGAACTATAAGTTTTGCCACCATGGAGTCATAG 
TAAGGTGGTATCGTATAATTAGTATAACATGCTGATTCAATTCTCACTCCA 

55 AAACCGCCTGGAGCAAGATATTGGGTAATCTTGCCTGGTGATGGCATAAAG 
TTTTTGTAAGGATTTTCAGCATTGATTCGAAATTCAATAGCGTGACCGTTA 
ATGGAAATATCTTCTTGTTTAAAAGGTAACGCCTCACCCATAGCAACTTTG 
AGTTGTAATTTTACTAAATCTACTCCTGTTACCATTTCAGTTACTGGGTGT 
TCAACTTGAATACGTGTATTCATTTCCATGAAATAAAATTGGTTATCATCT 
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AAATCATATATAAATTCAATTGTACCTGCGTTTTCATAATTTACAGCTTTT 
GCGGCTCTAATTGCAGCATTACCCATTTCTTGGCGTTTATCTTCACTTAAA 
ACTGGTGAGGGTGCTTCTTCAACGAGCTTTTGCATTCTTCTTTGAATTGTA 
CAATCACGTTCACCTAAATGTATAACGTTTCCATAAGTATCGCCAATAATT 
5 TGAATCTCTATATGTCTAAAGTTTTCTATAAATTTTTCTAAGTATAAACCA 
CCATTTCCGAACGCGGTTTCAGCTTCTTGTTGTGTCATACGGTAACCAGTT 
TCAAGTTCTTTCTCATCACGAGCAACCCGAATACCTTTTCCACCACCACCT 
GCTGTGGCTTTGATGATAACTGGATAGCCGATTTTTTTAGCTATTTTTTTA 
GCGTCATCTATACTTTGAATAAGTCCTTCACTTCCTGGTACTACAGGTACA 

1 0 TTGGCTTTAATC ATTTCAGCTTTAGCAATATCTTTTATTCCCATTTTCTGA 
ATTGATTGATAGCTTGGCCCAATAAATTTTAATTGGCATGCTTCACACAAT 
TCAGCAAAATCACCATTTTCAGCCAAAAAGCCGTATCCAGGATGAACACCA 
TCACATCCTGTAGAAGTAGCGATAGATAAAATATTTGGTATATTTAAATAA 
GAATCTTTTGATAAAGTAGGACCGACGCAGTATGCTTCATCAGCAATTTGA 

1 5 GTATGTAATGCATCTTTGTCACCTTCTCCGTGTTACAAATTATGTTACAAG 
TTTGCATGAAGGTTTATTTAACATCGCTGCAGCTGTAGGTGTTCATAGTCC 
AACGGAGATTACTTCCGACCATATTATCTATAGACAATTAGATGGCACTAC 
AACGTCCATTCAGGATTATAAACTTAAATTAATTTCTTAAATCACAACATT 
GAAAAGGAGCGAGACAGAATGCTGTCTCACTCCTTTTCAATTTAATTATCT 

20 TCATTCATTTTACTTTTATCTTTTAAACCATTACGTTTGAGTTGTTTAACA 
ACCCTATCCATACTCACATGGTCACCCTTTTTTGCATCTGATGATATGAAA 
CCTAAATTTTTCTTAAGCTGATCTATATCCGCATCTTTATAATCAATATCG 
ATATGTTCGATGGCCTTTCGATCTTTAAAATCAACTTTATATGTGACCCCG 
TCTATCCCTTTGAATGCTTGTTCATCATTTTTAAACATTTGTTTAGCTTCA 

25 TCTTTATCGATGCCTAAATCATCATATTTAATTGTTGCTATAGTAGATTGT 
TTGAGAACTTCATCATCTTTGTATGTCAAAGATGTAATCACTTGTTTACCG 
TTGACTTCGCCTTCATAAGTTTTTGTGCGTTCTTTCCCACATGCCGAAAGA 
GATACAGCACTTATAACTAATACTACTAAAAGCGATATGACTTTTTTCATA 
GTAGTCGCCTTCTCTCATCTTTAATCATGTTAATTATATCATCTTTAATTT 

30 GAATATGTTACTTATATGAAGATAAATATCTTTTTATTTCGAAAATTTAAA 
CATCTATAAACATGGAATTCAGAATTTCACTACTTAGAATGTTATACTACT 
TTTAAAGACTTATTAATGGAGGAATGACTATGCCAGAATCTAAAGCGTTAC 
TTAAATCTTTAACAGATGTTAATGGTATTTCTGGACATGAAATGCAAGTTA 
AATCT 

35 

Sequence 3338 

step . lOOOf 04 . cons . ok 

TAATATCTTTTTTATTTGGTAACC7VAACGTTTTCAGCTTGAGTAAATGGAT 

40 AAATTGTATCTGATGCGGCTACTCGAGCTATTGGAGCTTCTAATGAAAGAA 
TTGCTCGCTCTGCTAATTCTGCTGCCACTTGTGCACCCACACCAGCTTGAC 
GTTGTGCTTCTTGTACAACTACAGCACGTCCAGTTTTCTCAACTGATGCTA 
CTAAAGTATCTATATCAATTGGTTGTACAGTACGTAAGTCAATAACTTCAA 
CTGAATAACCATCTTTTTCTAACTCTTCAGCAGCTTTTAGTGATTCTTGTA 

45 CCATTGCCCCGTAAGATATTAGAGTAATATCATTACCTTCTTTTTTAACAT 
TGGCTTTTCCAATGTCAATTTTGTATTCTTCTTCAGGAACCTCTTCACGGA 
AAGAACGATATAATTTCATATGTTCTAGATATACAACTGGATCATTACTTT 
GAATAGAAGAAATTAATAATCCTTTAGCATCATAAGGACCTGATGGAATAA 
CTACTTTGAAACCAGGTGATTGAGCTAAGATACCTTCTAAATTATCAGCAT 

50 GCAACTCTGGAGTGTGGACGCCACCACCAAAAGGTGTACGAATTGTAACAG 
GCGCTGGTTTAGTTCCACCTGAACGGAAACGAGTACGAGCAATTTGACCAG 
CTACTTCGTCAAATACTTCATAAACGAATCCTAAGAATTGAATTTCCATAA 
CAGGACGGAAGCCAGTCACTGCTAAGCCTAGTGCAAGCCCACCAATTCCAG 
ACTCTGCTAATGGTGTATCAAATACTCGATCTTCGCCAAATTCTTTTTGTA 

55 AACCTTCAGTAACACGGAATACACCACCGTTAACACCAACGTCTTCACCGA 
AAACTAAAACGTCTTCGTCTCTTTTGAGTTCACTTTTAAGCGCATCGTTAA 
TCGCTTGAACCATTGTCATTTGTGCCATGGCTTACTTCGACTCCTTCTCTT 
TGTAAATTTCATATTGTTCTGCTAAATTTTGAGGCATTTCTTCATACATGA 
TATCCATTAGAGAAGTAACAGTTTGTTTTTCTGTATTGTCAGCCTCTTTAA 
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TAGCTGCTTTTATTTCAGATTTTGCACGTTCAACCACTTCATTTTCTTTGT 
CTTCATTCCAAAGACCTTTAGCTTCTAAATATTTTCTGAAACGTACTAATG 

GGTCTTTTTTCTCCCATTCAGCATCTTCATCTGAAGTTCTATAACGAGTAG 
GATCATCACCAGCCATAGTATGTGGTCCATAACGATAAGTTAAAGTTTCGA 
5 TAACAGTAGGACCTTCTCCTGCTACTGCACGTTCACGTGCTTCTAATGTTG 
CTTGATAAACAGCTAAAGCATCCATACCATCAACTTGAATTCCAGGGATAC 
CAACTGAAATAGCCTTTTGTGCTAATGTTTCTGCAGCTGTTTGTTTACTAC 
GTGGTGTAGAGATGGCATAGTTATTGTTTTGAATTACAAAAATTGCAGGTG 
CTTTGTATGCAGATGCAAAGTTAATTCCTTCATAGAAGTCACCTTGTGATG 

10 AACCACCATCACCTGTATAAGTAATTGCGACTGCATTTTTGCCACGTTTTT 
TAAGTCCAAACGCTACACCGGCAGTTTGAATATATTGTGCACCGATAATAA 
TTTGAGGGCTAAGTGCATTAACTCCCTCAGGGAATTGGTTACCTTTGAAGT 
GTCCTCTTGAGAATAAGAATGCGTCTGTAAGAGGTAAGCCATGCCAAATAA 
TCTGAGGCACATCACGATAACCAGGTAAAATGAAGTCTTCACTTTCTAAAG 

1 5 CATACTGAGATGCTAATTGTGAAGCTTCTTGTCCTGCTGTTGGTGCAT AGA 
AACCTAAACGTCCTTGTCTATTTAACGAAATAGAACGTTGATCAAGAATTC 
TAGTCCATACCATTCTTTCCATTAATTCCACTAATTGTTCGTCTGTTAAAT 
CAGGTACTAAGTCTTCATTAACGACATTTCCGTCTTCATCCAAAATTTGAA 
CCATTTCAAATTTCGATTGAGTCTCATTTAAAACTTTAACTGCATCGAATT 

20 GGGCTTGTAACTTAGGAGCCATTCAATTCACCATACCTTTCCCATGTAAAT 
AGAAATTCATTTTATCCTCCTTAATTGTATCACAAATAAAAATCTCTGTTA 
AACGATTTCATAAAGTTCTGTTCCACATAAAAATAATACAGTTACATGACC 
TGTTGTATTAAAAATCACGTAACTGTATTAACCCATTATTTATTTTAATTG 
GTCTACATCTTGTTTTTCTCTTTTTACTTTCTCAATGGCTTTTGAGTAAGC 

25 ATTAAATTTATTATTCATTTCTTTATATGCTTTAGAAAGATCTTTCGATTT 
TCCGTCTACTTCCGATTGTGTTGCATTATCTTCATTCAAATAAGAAAACAG 
TTCTTTTTCCTTATTAAGTGCTTTTTTGTAAGCCTTTGCATAAGCATCATG 
TGATTTATATTTTTCTTTAATAGCACTATCAAGTTGTTCAACTTCTTTTTT 
CTTTGCTTTGTTTTCTACATGTTCAAGATATTGCTTGGCTTGTTT7U\ATGC 

30 TTTTTCAGAATTATCTAGAGCCTTCTCTTCTTTTTCAAATTCTTTTTGTCT 
TTGTTTTACATTTTCAACTATATCTTCAGCTGCTTTTTTACGTGTGCTTTG 
ATCTTTATCATTTACCTTTTTAAATAATTTTTGCTTTTTCTCTTCTAATTC 
GTTTATTTTTTTACTTACTTGATTAACGGTTTTCTCTTGGTCCATAGCTTT 
TTGTACTTGATCATCATATTTTTTAATTTCTTTTTTATCTATAGTACATCC 

35 AGCTAATAACACTGCTGACGTTAGGACAATTGCTACAGTCTTTTTAAAATT 
CATGAACTGTTACCTCCTCAAAGTTGTTCGCTTGTAATATTAAATAATAAA 
TTATAATAAATCAATATAAAGCTATCATTCTATATTCTAGTGCAAACATTT 
AAAGCAATAAAATCTTTGAGTATCTATTAATGTACTTGTGTCTATCTTATC 
TGATATATTATTTAATAAGGAAGGTGTATGACATGATAACAATGAAAGATA 

40 TTATAAGAGATGGTCATCCAACACTTCGTGAAAAAGCGAAAGAATTAAGCT 
TCCCACTTTCTAACAATGATAAAGAAACATTGCGCGCAATGCGTGAATTTC 
TAATCAATAGTCAGGATGAAGAAACCGCAAAACGTTATGGTTTACGTTCTG 
GCGTAGGTTTAGCTGCTCCACAAATTAATGAACCAAAACGTATGATTGCTG 
TCTACTTACCTGATGATGGAAACGGTAAATCGTATGATTATATGCTCGTAA 

45 ATCCTAAAATAATGAGTTACAGTGTACAAGAAGCTTATTTACCAACTGGCG 
AAGGTTGTCTAAGTGTTGATGAAAACATCCCAGGTTTAGTGCATCX3TCATC 
ATAGAGTCACTATTAAAGCTCAAGATATTGATGGAAATGATGTTAAATTAC 
GTCTCAAAGGCTATCCTGCAATTGTATTTCAACACGAAATTGATCATCTAA 
ATGGCATTATGTTTTATGATTATATTGATGCCAATGAACCTCTAAAACCAC 

50 ATGAAGAGGCCGTAGAAGTCTAAATCATGAATTATATTAAAACACTTCATT 
AAGTTGTATTTTATACTTACTTAATGAAGTGTTTTTTTAAAAGTAATTCAA 
ATGTCATTAGGATTAATTT 



55 Sequence 3339 

step. 1000f07 .cons .ok 

TCAATATTATTTTTCCTTCAATATCATTTTATTAATATAATTAGGTAGAAG 
TCTACGCTGTCTCTAAGTGGTAGATGAATGATCTCTATTTTTTATTATGTT 
ATACATTTATATTATTTTCTGTAAGGTTCTGATTCTAAAAGATATTTATAA 
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TAGGAGTCTTTCTTTTGTAAATCTGTTTCCCCCCACACCTTATATTCTTTA 
TCATCTGTCCATGGTGCTGGTGTCACGTTATACTCATACTCACTATATGGT 
TGATCTTTAAAAACAATATCTTTTTCAGCATATCCCATTATAATATTAAAA 
CTCATTTTTTCATTTTTTATTTTTTTATCCCAACCCTTACGTTCTATAACG 
5 TGATTGATGATTTTCTCACTCTTGTGTTTCTGAATATATTGTAGTATAAAG 
AAAATAAAGTTTATAAAATAGAGCTTGATGATATGAATGACAATAAATAGA 
CAATAAATAACCGCATCATAATCTAAAATGATTATAATACGGTAAATTTTG 
ATATTATTTTATAACTCAATTGTAGCTGGCGTATCTAGTTCATCATCATTC 
ATTAATTTAATTGAACGAGGTATAACTTTGAGTACACATAATTCAGGATCT 

10 TCTTTTGAATTGAAAAATGTTTTGTCTTGAGTTTCCCATAACCAATCAATA 
ACTTTTTGATTTTTGACAACTTCTATAGTGGCATCTATTTCAACAAAGCTA 
TTGTTTGTTGTTTCGTTATAGCCTAATAAAATATGAGCATCAGGATTATTT 
TCTATTTCTTCGACTTTTAGTGAATTGATATTCGTTTTTGTATATAAGTTT 
AAGTCATCGTTGTAAAAGACCATATATCTGCTATTAGGTTTATTATGATGT 

1 5 GCTGTTGATAGGACACCAATTTTTGAAGAATTTAATACTTTTTCTATTGCT 
TTTGTCACTTGTTGTTTATTCAAAATTATCATCTCCTTATTATTGTATATA 
TCCTTATCACACATAATTTAAACGAAACAACCTTATGATGATTTTTAAGAA 
GGTTTAAAACAAATATATTTGACTTCATCGTGTTTTAAGGTGACTTGGAAA 
GGAAGATGATGGTCATGTTGTATAGCTAATTCTGTTTTTGGTTGATTTGAT 

20 TTATCAAGTTCATCAAGAATACTACGTTCAATATAAAATTGTTGTTTAGTT 
TGTGGTAGAAGGTTTTGAATTGCGCCATGCTCATGATTTAACGTTTTAATA 
ATAATTAATGAGTTGGTTGATAATGTATTTTTAAAAACGTAGCGCTGTTGA 
CTATCAGATAGATAACGATCATTTATTTTATTAAATAATAAAAAGTGATAG 
TTACCATCTTTGCGACTCATTAAAAAATTGTTAGCTATCTCTAAAGGTTCA 

25 TTTAAAAATGGTTCGATTAATTTATAGAGGTGCATGAGAGGTAGGGGACTA 
CCATAACGATTAAATAAGGCAATAAAAGGACTCTCATCTTCTATCAGTTGA 
TAACTTACACCACCCCCGTAACGCATTAATGCGACTAAATGACAAACAAGG 
TCAGATAATTGAAGTGGATAAGTACCATTGTTTGTATATTTAAAACACTTT 
AAAGATAAATTATTGTAAACAAATTTAGTTGAAGGTATACCGGAATCTAAT 

30 ATAAATTGTTTATAATTTTCAAAGTGTGTGGACGAATGTAAGTACGATTTA 
CGTTTGGTAATTAATGAGTTTTTGGTTTCTAGATTTTCAATGTCTATAAAA 
TAGTAATCGAAATTTAATCGCTTCATCATATCATAAGTTCGATCAATAGAA 
GAATTTTGGTGTAGTAATCCTTCAACAGTTATACTATAACGGATAGCTTTA 
TTTTTATTTTTAATTTTTAAATGG C ATAAATGTATATCGTTCAC TG AC ATA 

35 TTCTTAGTTTCAAAGACTAACATAAATTTAACCATTTTTTTGTTCATTTTA 
TAATCTTGGTGGCTATTAAGAAATAATAAAATTATTTCTTTGATAGATTCA 
AATTCATTTGTAGATGTTAATCTCATGGCTAAACCTATATTTTTTTCGAAT 
AATTTTTCAAAGCATCGATTTAATAAATTAAAGTTCACCTCTCGCGTCGAA 
ATATCAGTGATATCATTGATAAAAATTACAGCTTGTGGCAGGGAGGTAAAA 

40 TCAATGTTGTAATATTCATTAAAAACAAATTGAAATAATTCGTTGAAATTT 
TGAAAACGTATAAACGCTTTAGAGTTTTTAGTCTGATCTTGAGGATAAAAT 
TCATTGATATCTCTTTCAGTCGTTTCAGTAGCCAAATGATCACTAAACTCA 
AATTGATTAATTAAATCAATATATTGTGAAAAGTCAGTATCACTAAAACTT 
ATCGAAGGTAAGGAGTCTAACTTAGAGATAArcGTTCGGTATTGTTTTGGG 

45 CTACAACCTAAATAATTTTTAAATTGATTTGTAAAATTGGTATGACTACTA 
AATCCAGATTGTTCTGAAATAGCGTTAATTGAGTCCTCAGATGAAAGTAGC 
CTTTTTATAGAGTCAATCACTTTTAAGCTTGTAAAATAATCTTTAAAATTC 
ATATTAAGATATCTTGCAAATAGGTTAGAACAATAAGATTCAGATATATTA 
CAATGTTGAGCTACTTCTCGTAGTGAAAGTGACTGTGATACTTTTGAATGA 

50 ATAAATGTTAATCCTTCAGTAAACACTGAATGATTAACTGCAATTTGAGGA 
ATATATTTTTTCTTATATCGAATGACTGCTTCTTTTAGTAGCGTTTGTATT 
ATTTTAGAGATGGATTGCTCATCTTGATTTTCTCCTTTTATTAAATGTTGA 
ATAGCTTGTAAAATAATTGTTTTAACAAAACTGCTTGATTGTAATAAATGT 
CTGTCAAAATAACATTTAAAAAATTTATTATCTTTACTATAAAAATAAATA 

55 ACTGGAATAGAGAGTAATACGAGATTTGAAGCACTATTAATAAAATAGATG 
TCAGATTGATTAATGATAGCTATATGATTACCTAAATCTTTTGTTTCACCA 
TTAATAGTGATTTTTAAGTCATTTGTAAGTGAAAACATCAGAATAATTCTA 
TTAATATTGCGTTCAGGGGGAGAATGATATTCTTTATATATATTTAAAAAA 
CATGTCATTGTATATTACATTAGCTCCTTGAATGAATTGAATAACTAAATT 
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TAAATGTGACCGTGT 



Sequence 3340 
5 step. lOOOf 11 .cons .ok 

CCATAACCATGATCAATAACTAGTGGATAATATTTAATCCGTCCAGAATGT 
GCAAAATAACGTTCATCTTCAACTATCAAATGATTAACCTTACTCATTGAG 
TTCCCTCCCAATAGTGACTCTTTTTAATATAGCTATCTACATATGAAGCTA 
TAAA/^CTATCAACTGCTATCGTTTTAAAATAAATCAATATTAATAGGATA 

1 0 GTAATCAATATACTAATAATTTTCAGATTTGTAAATATTTTCTGATAATTT 
AGAATACTTACGTTCCGAATCTTTTGTGATAAAGAAATATATGTCATTGAG 
GCTTAGAATATATGTTTAAAGGGAATTTAGAATAAAATATTTTACGGAGGA 
GATTAATATGCATTGGTTAAAAATATTTTATCATTTATTATGCGCAACCAC 
GATTAGCGTGATATTACTTATTATAACTATATTAATGGATGCGTTACTACA 

1 5 AAACACAC ACTTAACTC AGTTATTACTCAATATTGATTTTTTAATCAACCC 
TGATGAAGTGCCAACAATTATTGAAGTACTGATTCATTTAAGTATTGGAAT 
ATTGATTTATCTCGCCTTTTTAATTATCTATCATTATTCAAAATCCTTGTA 
TCATCTAGCATACTTACCTTTAGTATTGATATTTACTTTGATGTATCCACT 
TCTCGTTTTTCTTGCGCAACGTCCATTTTTTTCCTTTAGTTGGAACGAATT 

20 TGCATGGTGGTTAGTTGCACATCTTTTTTTCATCATTTTAATGGCGACTTG 
TCTACCTATCATTTCGAAAAAAATTTTATGATTTTAAAACTCATATCTCAA 
TGTCTTTTACACATTATAAAATAAATATTGGCAGTAGAAACATCATAGATA 
ATTAGACTATGATATCTCTACTGCCTTATTAGTTATTCTGGCTCTCCATCT 
TCATCATACTTCGTTACTTCTCCATCTTTATCGATAATATAAGATCCTTCT 

25 AAATGGCCTTCTTTATCTGTAAATGAGAACCCCCAACTACCATCGCTATCT 
TTTTCTGGTTCTTTGTAAATATATGTGTCTGTATCTAATTGATGCCCTTCG 
TAGTCTTCAACTATATCGATAACATTACTTCGTGTCACCGTTGCTTCATCA 
GAACTACTTGATTGATCTTCAGATTCAGAAGTATCATCTTCATCTGTTAAA 
TTATCACCAACTGAAAATTCAGTCCTCATCATCTGACTTATCTTATTGATT 

30 TCACTTTCACTAGCGTTGTATAACTTAACTGTACGAGCATCATCTATAATG 
CGTTGAGATTCTCTTTTAGAGTAATCAGCGTCACGCCAACTCCCTTGGAAA 
TGTGATGGTACACTATATATTGTAATTGTGCCATCTCCATTACTTTTATAA 
TAAACTGAACCTGCTGCAGTAACACTTGCTGTTAACAATTCTGTTCCTTCA 
GGTAGTTTGGCTGATTCGTCTGGGTGATAGGGATCTAAAACTTTATTAGAA 

35 ATATTTTGATGTACAATTTCTAAATCGTCAAAAGGAAGGTCACTTTCACCT 
CTATAACTATCTAAAGCAGTTAACCAAACTTGAGCATAGTATCTATTCGAA 
TGGGTATCTTGTTGTGTGTCCTGCTGATTATCTTCTGTTTGATTTCTACTT 
ATACTTGATTTTGAATCTCTGTTACTTTTATATCCTGATTGTTGATCATCA 
TTTATCGTTGAGCTTCCATTATTATCACTATCACCAAGACCACAACCAGCT 

40 AATATTAATATTGTAAACGCACATATCACTGCTAATTTCTTCAATTCAAAT 
ACTCCCTCACATTCTAATATATTTAACTCAACAAAATAAACCACTTTATAA 
AATAATTATAAACTAACTATTTTTAAAAATGTACAACAATTTAACAAATTC 
TTAAAGATTCGCTATTTACGAAGCAATACTTGCCGTCTGTTTTACTGCTAT 
ATCATTAGCAATGGTTTATGATTATTAAGATTTTTAACGTATCTTAAAATT 

45 TGTATTTGAATTCTTTTAAACTACTATCATCTCGCAATCAAGGTGCTAAAC 
TATAATCGTAAAGTAATTTAAATACATCATATATCATCAATTCGAAATGGG 
GTTATCAATGTGAAAAAGATTTCTCTAATTGCAACGACTGTATTAACAGGA 
TTATTATTATTTCCAAGTGTCAATGACACAACGACACATGCAGCCGAAGTA 
ACATCTCATGATGCAAAAGTGCTTTAAATGAAGCTCAGCAGTCATTGTCAC 

50 AATATGACAGACAACTCAATGACAGCATGAATGCATCATTTGATGGTAAAA 
TTAACATTAAAAATGATTCAGATGTAGGCGAAGGGCAACCTATTTTGCAAT 
TAATTTCTTCAAATCCTCAAATTAACGCAACTATCACAGAGTTTGATATTA 
ATAAAATTAAAGAAGGCGATGAAGTAAATGTCACTGTAAATAGCACAGGTA 
AAAAAGGAAAAGGAAAAATTCTTAAAATAGATGAACTTCCTACAAGCTATG 

55 ATACAAGTGACGATAGTACAGCATCATCGGCACAAGCAGGGGCACAAGGTG 
ATAGTGAAGAAGGAACTGAAATGACGACATCTAATCCTACAATTAATCAGC 
CAACAGGTGGTAAAAGTGGCGAAACATCAAAATATAAAGTTATCATTGGTG 
ATTTAGATATACCCGTGAGATCAGGCTTCTCTATGGATGCTAAAATCCCTC 
TTAAAACTAAAAAGCTACCAAATAACGTGTTAACAAAAGATAATAACGTAT 
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TTGTCGTTGATAAAAATAATAAAGTTCACAAACGTGAAATTAAAATTGAAC 
GTAATAATGGTGAAATCATTGTGAAAAAAGGATTGAAATCTGGCGATAAAG 
TCCTTAAAAGTCCAAAAGGTAATTTAAATGATGGAGAAAAAGTAGAGGTGT 
CATCATGATTGAATTGAGAAAAATCAATCGACACTTTAAAAATGGAAATGA 
5 ATCAAATCATATTTTGAAAGACATAGATATTCATATTGATGAAGGTGAATT 
TATTGCTATCATGGGTCCGTCTGGTTCAGGTAAAAGTACGTTAATTAATAT 
CTTGGGATTTATTGATCGTGGATATGAAGGAGAATACTTTTTTAACAATGA 
GAATTATCAAAAAAGCTCAGATAATAAGCTCGCAGAAATTCGAAATCATAC 
TGTAGGTTTCGTCTTTCAGAATTTTAAATTGATTCAAAATAACACTATTTT 

10 AGAAAATGTAAGTATTCCGCTAATTTATAATGGTTTGAGTAATAAAGCGAG 
AAAAAGCAAAGTTTTAGACGGACTTCATGACGTCGGTCTCAAAGGTAAAGA 
GAATCTCTTACCAAATAAATTGTCAGGAGGGCAGCAACAACGTGTAGCTAT 
TGCCCGGGCAATTATTAATGACCCTAAGTTTATTATTGCCGATGAACCTAC 
TGGCGCTCTTGATTCGAAGACATCTCAAGATATTATGGAGCTTTTCGT7U\A 

15 ACTGAACAAAGAACAGAATACAACTATTATTATGGTAACGCATGATCGTGA 
AGTGGCGGAAAAAGCTGATCGTATTATCCATATTTTGGATGGGCGTGTGCA 
AGAGGAAGAGGTGTTGACACGTGAATAACTTTTCAAATGTCATATCAGTTG 
CTATTCGTTCCATTTTAAAAAACAAACG 

20 

Sequence 3341 

step. 1000fl2 .cons.ok 

AAAATTAGGTCCAGAAAGTTTCCCATCTTTCGAAATATCGGTATAAATCAC 
ACCACCCAAAGGTiVAATGCTCAATTTTGGCAACATAATCAAATAAATTGAG 

25 TTTAGCATCCTCTTTCCATCCATTAATCTTTATTTTCTCTCCAAAAGCATC 
TACGGATAAGTAGAGTTTATTTGGAAATTGATGTGTCATATGTGTTAACCA 
CTCTATATCTTGGATACCTTTTGTACCTACAATACAATAGTCTATTCCTGA 
ATGAATATAATTTTCAATTGTTTGTTTTGAACGAATGCCGCCACCCACTTC 
TATAGGCTTAGTGGTCACCTTTCTTAAGGAACGGATGTAGTCGAATTCTTT 

30 TACCTCTTTTGCTTTAGCCCCAATTAAATCTACAATATGTATACGTTTCAC 
ACATTTAAATTGACTATAAAATCGGATACTGTCTTCTACAGATTTTTCCAT 
TTTTTCTTTTGAATCATATTTGCCTTCTGTTAATCGAACACTTGTTGAATT 
AATTAAATCAATAGCTGGCCATAAATCAATCATTAATAAACCCTCCTTTAA 
GCGCTTGATTTAGAATCTCTAATCCATACGTTCCACTTTTTTCAGGATGAA 

35 ACTGGATACCTATATAATTTCGGTATTGAATGACTCCCGGAATCTTTGTAC 
CATAGTCAGCATAAGCTACTACATATTCTGACATTTCTGCTTGATATGAAT 
GAACAAAATACACATCACTTTGCAGTAAGGGATGTGTACTCTTTAATTCAT 
TCCAACCCAAATGAGGAATAGGATGAGATGATTGGATTGGCACTATATTTC 
CCGGGACAAGTTCCAATCCACTAACGTCACCTTCTGCGCTATGTTGAAAAA 

40 GTAATTGCATACCTAAACATATTCCAATTATCGGTTTATCATGTATATTTT 
TAAGCATATCTTTAATGCTTTTTTCTTCTATAGAATGCATCGCATCCTGAA 
AATGTCCAACACCTGGAAGTACGATAGCTTCAGCTTTTTGCACGTCTTTAT 
CATCACATGTCAAAATCACATCATATCCTAAATGTTGAATTGCTCGAGTGA 
CGTTACTTATGTTTCCCAATCCATAATCAATAAtCGCAATCATTCAATTAC 

45 TCCTTTAGACGATGGAATACGTCCATCTTCATTTTGTGCAAGAGAAATCTT 
TAATGCTCTTGCAAAAGATTTAAATATTGCCTCAATCTCATGATGTGTATT 
TCCACCTCTTAATAAGTCAATGTGAACGGTTAATCGCGCATTAATTATCAA 
TGCTCTAAAAAATTCTTCAACTAGTTCAGTGTCAAAAGTTCCTACCTTTTG 
AGCGCTCAACTTGCTATTAAATGAGAAATATGGACGACCACTAATGTCCAC 

50 TACTGTTCGAGCAAGCGCCTCATCCATGGGTACATATGAGCAACCATATCT 
TGTAAAACTTTGTTGAGTCTTTATTAATTCAAGAAGTAATTGTCCAATAAC 
TATACCTATATCTTCAGTTATATGATGATCATCAACATACGTATCTCCAGT 
GGCCTCAATAGATAAAGTTAATCCACTATGAAAAGTAAATAGCGTTAACAT 
ATGATCTAAAAATCCTACACCAGTATTAATATGAGATTGTGTTCCGTTATT 

55 AGCCAGTGAAATATTAAGCTGAGTTTCTTCAGTGTTTCTTTTGATTTGATA 
GTTCATAACATACGCTCCATTTCTTCACTATTTTTCCAGTAGAAGAAGTTG 
TTCTTCTGTAGCAATAGAGTAACGAACATACGCTTTCATTTTTATTAACAG 
ATATCGTAATAACCTATCTTTATCGGCGTACGTGAATCGATTTTTCATGAT 
TAAATAAAGATTCTATATGTGCAATATGCTCAGCTGATTCTGCCACTTCGT 
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TAAATGTTTTTTGTGATAAATGAATGACTGAATGTCGTGTCATAAAGTCAT 
Ta^CAGAAAGCCCATTTGTAAATCTAGCTCTTTGATTAGTAGGAAGTACAT 
GACTTGGACCTGCCACATAGTCCCCAATAACTTCAGGAGAAAAATGACCTA 
AAAAAAGTGCGCCTACATATTTAACTTTATCTATATACATTTCAGGTGCTC 
5 GAGTTTGAATAGATGCATGTTCAGGTGCAATCGTGTTCATAATTAAACATG 
CTTCTTCAGTATCTTGAGCTAATATTAGATAATGATGGTTAGCAATACTTT 
GTGATATGATATCTTGCCGTTCAACATACTGAAGTTTCTCTTGTATTATAG 
TGTTCAATTGATTAAGTACCTTTTCATTTTCACTAATCACATAAGTACAAG 
CCATTTCATCATGTTCTGCTTGTGCAAATACGTCATAAGCGATTGCGTCTA 

10 AGTCAGCACTTTCGTCTATAATCAAGGCTATTTCTGTCGGTTCTGCGATTT 
GGTCTATGCCTACTTGACCGAATACAAACTTTTTGGCATAAGCAACATATT 
GATTCCCTGGACCTACGATTTTGTCGACTTTTTTTATAGTTTCCGTGCCAT 
AAGTTAGCGCCGCAATACTTTGTGCTCCACCGACTTGATAAACATGATGAA 
CGCCTGTAATGTAACAAGCGGCTAACACCTCTTGACATATACCGCTATTTT 

1 5 G AGGTGGGGTAAC AAC AGTAATCTCATTAACACCTGCTACTTGAGCAAGTG 
TTGCAGTCATTAATACTGTAGACGGATAGCTAGCCTTACCTCCCGGCACAT 
AAATACCTACACGTTCGATAGGATGGTATCGTTCATAACATTCAGTTTGTT 
GAGATGATTCCTGTTTTACCTTAATATTTTCTTGGTACACTTTAATTCTCT 
GATAGCTTTGCTCTAATGCATCTCGTGTTTCATTATCTAGCATGTCGTATG 

20 CATTTTTTAGTTGGCTTTGCTCTAATTCAAGCTTCTCAGTTTCCACTTGAT 
CAAATTGTAAATTATAATTTTTTAAAGCTTTATCTCCCTGTAATTTAACTT 
CTTCACAAATCTGACTCACTATCTCGTACAAAGATTCATCTAGAGATTCAA 
CATTATTAAATTCTTTTAAAAATTGTTGTGCGCTAAGCATAGTTAATAGAC 
ACTCCTAACTGCTTGATTAGTCTCTCTATTTCAGATGATTGCTTAAAATAT 

25 GATTCTTTATTTGTAATTAACTTAGCGTTAATTTCACTGATATGC 



Sequence 3342 

step . 1000g04 . cons . ok 

30 ACACCATCCCTATCATATATACGGCAGTAAAGATGAAGACCCAAACAACAT 
CTAAGAAGTGCCAGTATAAACTTACTATAAATAATTTAGGAGCATTGTATG 
AATCCAAACCACGAGTGCCGATTTGAATTAACAAACAAATAACCCAAACAA 
TACCTAATGATACGTGTGCACCGTGCGTACCTAGTAGTATAAAGAAACTAG 
ACCAGAAGGAGCCAATAGTTGGGTTAACACCTTCAGAAGCATAGTGTGCGA 

35 ATTCGTAAATTTCGAAACCTACGAATACAAGACCTAGGATAACTGTGATAA 
TCATCCAAAACATCATTAAGTTTTGTTTTTCTTGTCGCATGTAATAAATTG 
CAATACCACAAGTATAAGAACTAATT7VATAATGCAAAAGTCATTATTAAAA 
TCAAATGTAATTCGAATAAGTCGGTAGTTAATTTACCGCCATATCCGCCAC 
CATGTTGTAACGTTAATAACGTTGCAAATAGGGTACCGAATAACGCAAATT 

40 CAGCTGTAAGGAAAATCCAAAAGCCAAGTTTATTTAATTCGCCTTCGTGTG 
TACGAGAATCAATAGTATTTGCATCATGACTCATGACTTACAGCCTCCCTT 
TCTTTAATTCGAGCTTCTCTTAAACGAGCTTCAGTTTCTGCAACTTCTGAA 
GCAGGGATGTGGTAACCATGATCAATTTGGAAACTTCTCCAAATCATAGTA 
ATGAAGATACCTGCTAAACAGATAAGTGCTGGAACAATAGATTCGAAGATT 

45 AAGAAGAAACCACCAATAGTCATAAATATACCCATCCAGAATCCTACTGGA 
GTATTGTTTGGCATATGAATATCTTTGTAGTTATGGTTGTCTAAATAATGA 
CGACCATGTTCTTTCATATCAACGAATGTATCGTAGTCATTCCAATCAGGA 
GTGATAGCAAAGTTGTATTTAGGTGGAATAGCTGATGCTGTAGACCATTCT 
AAAGTACGACCAAGTCCATCCCAGTTATCTCCAGTAGCTTCACGTGGAGCT 

50 TTGATATGACTATAAACGATACTTGCAACTAGGAATAAGAATCCAATTGCC 
ATCAATACTGCACCGATAGTTGAGATGAAGTTTAGTAACCACCAACCATCA 
GAAGGCATGTAAGTGTATAGACGACGTGGCATACCATCTAAACCTAGAATG 
AATTGTGGTAAGAAACAAACGTTAAATCCGATCATGAAGAACCAGAAGCAC 
CATTTGTTTAATGTTTCATTTAACTTGTAGCCCATCATTTTTGGATACCAG 

55 AAGATTAAACCAGCTAAGCAGGCAAATACTACACCAGTAACCAATGTATAG 
TGGAAGTGAGCTACTAAGAAATAAGTGTTGTGATATTGATAGTCAGCTGAT 
GCCATTGCAAGCATTACACCAGTAACCCCTCCTAATAAGAAGTTAGGGATG 
AATGCTAATGAGAATAGCATAGGTGACTCAAATGTAATTCTACCTTTGTAT 
AATGTGAGCAACCAGTTAAATAGTTTAACTCCGGTTGGAACACCGATTAAC 
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ATTGTTGAGATAGAGAAGAATGAGTTAATTAACGCACCATTACCCATAGTG 
AAGAAATGGTGAACCCAAACTAAGAAACTTAAGAATGCGATACCTGCAGTT 
GCCCAAATCATACTTTGATGACCGAATAAACGTTTACGGGCAAAAGTAGGG 
ATGATTTCTGAGTACATACCGAATGCTGGCAAAATAACGATATAAACTTCA 
5 GGGTGCCCCCATACCCAGAAGAAGTTTGCCCAAAGCATTGGCATACCGCCA 
TTTGCTACAGTGAAGAACTCAGTTCCAAAAATTCTATCAGCAGTCATTAAA 
GCAAGTGCTACAGTGAACACTGGGAATGCTAATATAACGATTAATGTTGTA 
ATGAATGTTGTTACACTGAACATTGGCATTTGCATAAACTTCATTGTTGGA 
GTTTTACATCTTAGAATCGTAACAAAGAAGTTGATACCAGTCATTAACGAT 

1 0 CCG ATACCAGATATTTGAATTGCAATTAAATAATAGTTAACACCTGGACCT 
GGACTGAATTCACCAGCAAGTGGTGCGTAGTTAGTCCAACCAGCAGCTGGT 
GATCCACCTACAATAAATGATAGGTTGAATAAAATCATACCAGCGAAGAAT 
AGCCAGAAACTAACGTTATTCATTACAGGGAAGGCAACATCGCGTGCACCA 
AGTTGTAATGGAATAACAACATTCCATAAACCAAAGATAAATGGCATAGCC 

1 5 ATAAAT ATAATCAT AATTACGCCGTGCGTAGTAAATACTTCATTATAGTGG 
TTTGCTTCCAAGAATTTGTTATCTGGAATTGTTAATTGAGTACGTAACATT 
AACGCATCGATACCACCACGAACGAACATTAATACGGCAGAAATTAAATAC 



20 

Sequence 3343 

step . lOOOgOS . cons . ok 

TGTGCTCTAAAGAAGATCCAAATAACATAGTTATTAGACCAACAAATGTCA 
CTGTATAAATATAAACATTACTAAGTCCCAATAAAGGTGTAAATCTAAATA 

25 GTAAAAAGATTCCAGCCTTTACCATTGTTGCCGAATGAAGATAAGCACTTA 
CTGGTGTAGGTGCTGCCATGGCCTTTGGTAACCAAATATGAAACGGAAATT 
GTGCAGATTTGGTAAAAGCACCTAATAATAGCATCAAAATCATTGGTATAA 
ATAAAGGATGTCGTGAAATTGCATTGCGTTGATTAAGGATATCAGTAATTG 
TGTTTGTTCCTGTAATGATATATAAAATGATAAATCCTGTTAATAACGCTA 

30 GCCCACCAAACACTGTAATCATGAAAGATTGAATGGCGCCTAATTGACTTT 
CACCATTATTGTACCAATAGGATATAAGCAAGAATGAGGAAATACTTGT6A 
GTTCCCAAAATACATACATTAAGATGGTATTATTAGCTATTACAATGCCAA 
TCATACTGAACATAAATAATAGTAAATAGATGAAAAATCTAGGAAGATTGT 
CCGTACTGTGGGATAAATATTGCGTAGCATAAAAAAATACACCCACACCTA 

35 TTAGCGAAATAATTAAGCCGAACATTAAACTTAAACCATCTAATCTTAAAT 
CTAAATTAATATCAATTGAAGGCATCCATGGTAATCGAACAGCAATAAACT 
TATTTCGAATCACATCTGGTATTTTCATAATAAAATATGTAGATGTCACAA 
TAGGAGCTATTAATGCAACATAGCCCGCAACCTTTCTTAGTTGACGATGAC 
TCAGAGTGAATAAAACTATGAGCATGACTAATAAATTAGTCGCCATAAGAT 

40 ATACTAAACTCATTTAACCCCTCCTTTTCTTTTTACATTTATATTAAGTGT 
AATCATTTATTAAGTGTAATCATTGTATGGCGATTTGAAACATATCGTATT 
TCAATAAATCTCAATATTTCTTAAGTTATCTGATATCCAAAATATTGTTGA 
ATGATAGAAATTAATATTCCCGATTGAAAGAGAAGATAATCAAATGTTTTT 
CTTAAATTAGTTAAACTAACAGGTTTTAATCCTTATTATGGTGCAACACGA 

45 TTTATGATTTGATATTGCTAGTTGATTTGAATGAGTTTATTGAGCATACAT 
AGATTCAAGCACTGTCGCTTTTTTTCTGAATTGGCATATCACAAAAAGGTA 
ACTATTCTTATTTATATCCAAAAGGGAATAGAACTTTATTTTATATAAGAT 
AGAATCCATGAGGCTATAATTTTTCAATCATCGATAAACTTAAACTAATTC 
ATTCTCATATTGGGAGATCTTAATATTTTAATTCATATATAAATAGTAATT 

50 TTAGACACTGATCAATCTTTTAAGTTGGTTATTTTAAATATCAGGTTCCAA 
TTTTTTCTTTTACAACTTAACAATTCTACTTACCTCATATACCTTCTAAAA 
CAGGTTTCATAATTAATGATAAATGACGTTTGTATCACATCAAAGGCATT^ 
GCCTATATTTTTCAAAAAATGTGTCCTTTTTTTTCAATCTATTTCGATATA 
TATAGTGAAAGGAGGTTTTATTATGAATGAAGTCAATCAAGTGACGATTAT 

55 CTTTACTCAAATCAAACATAATTCTGAAGGTAAAGTAGTCGAACATAAGCG 
TCGTTTTTCTAACATTAATCCGCTTGTTTCTAATGATCAAATCAAATCTTT 
TCGTTCCATCATTGAACGTATCAGTGGCGAAGAGTATGACAAAGTCGAGAT 
CGTTAAAACAGAATCTTTAATTTAAGGAGGATACATCAGTATGTCAAAAAC 
TTTGGAACTCGTATTTAAATCAAATCTTAATAAACCCGTCAAATTACTATT 
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ACCCGATTTTAACGCTATTACTACAGAACAATTAATTAAAGAAAGTATGAA 
CCAACTTTTAGAATTAGACATTTTAAGATTTAGTTTGGGAAAGCCTGTCAA 
AATTTATGCTGCACAACTTATTGATAAAAGTACAACTGTTATATTTGAAGA 
TAAAAATTAATTCATAAAACATTATTTTTGACGCTTTATTTAAAGTACTAA 
5 AATCATAAAAAAGACTTATCGAAGAACAAATTTTTTAAAATTGTTCAAGTC 
TAAAAAAGAACTTAATTTACATTGTTCTTTTATTAATATGAAGTGAGATAT 
CAAATTCAAAATGATGTCTCGCTTTTTTTAATTCTAAAACTTGTTTAATGT 
GCATTTTATTATAATGATTCATCAAAATAGTATCGGCTGTATGACACCTCG 
GAACATTCACAACTTATTTCTATACTTTGCTTATGGCTCAATCATTTTAGA 

1 0 AAC ATGATT ATG AATGC AGTTTTTCTATGAT AATC ATAAATTCTATATGAA 
ATATCTGATGATTGCATGACTTCATACGCCATAATGCCTGCACCTATGGAA 
TGTATCGTTCCAGCTGGTATAAAGTAAAAATCACCTGGTTGCACTTGTATT 
GTTCTCAAATAATTTTCAAATGTACCTTGCTCCAATTGCTCTTCAAATTCA 
TCACGAGATTTCGCATACGTACCTATAGTAATCTTTGCACCTTCATCAGCT 

1 5 TCAATGATGTACCAACAITCAGATTTTCCATATTGACCTTCTTCGTGTTCA 
TATGCATAAGAATCGTCGGGATGTACATGAATAGACAATGGCGCAGCAGCA 
TCTACAATTTTAGCCATTAATGGAAAATCTTTACTTGGAAAATCTCCAAAT 
ATTTCTCTATGATTGTTCCATACTTGATCCAATGTTTGACCAGCAAATATA 
CCATTTTCAATCACACTTTTTCCGTGTGGATGTGCTGAAATTCCCCAACAT 

20 TCACCTATGTGATTATTAGGTAGTTGATACCCAAATTGACGAAGATTATCA 
CTGCCCCATACTTTATCCAGAAAAATGGGTTTTAAAAATAACGGCATGTTC 
ACACCTCCATTATTAAAAGCTTAAACTTAAATCATTATCTATGCTTGAATA 
TCACACAATGTTAAATAATAAAACTACTTTTGATACAACATAGCCCTTCTT 
TTAAGATATGTATTTGAAGCTAACTATCTTTCTCACTCGTATTGA6TGTAG 

25 CATATATGATTATTTCACAAACTACTATATGAATATACTATCATTTTATAA 
ACTTTCTATCCTGTGTCCATTTGACGTAGGATTCTTACATATTATATCAAC 
TGCCTAATAAATTGAATAGCATCATTTTTACTAATCTAATAAAGCGTTTTA 
TTGATTACTATACCATTATTTTCTGAAAATGACCCTTTAAAATAACAATGC 
TTTAAATTGGATTCAATAGAATCAAGCATTACACTTTTTTCAAGTTCATTT 

30 AAAAGTAGAAAATTTAGTATAATTATACTCATTAAAATATTTAACTTTGAA 
TATAAAAAGGGAACAAGATAAGACTTTTGATTTCCTTATCTTGCCCCCTCA 
AATAC 
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TCTTCGTTTAATTGATAGTTCTCACTAATGAAATCGATGGGTTTTTTATAT 
CTATAAGCTTTAAATAAAACTTTAGCAACTTCTTTATTTGTTAGTGTCATA 
TTGTACCCCTCCCCTAAATTATCTGAAAATTTATACTATTATGATAACTTT 

40 TAAATGTGAAGTGTGCAAATTGATTTAAATATACTTCATTTAGGGTATAAA 
ATAATTAATAATGTAAATTTTGATAGGAGTGGGTATAGTGTCAGATTATGT 
TTATGAGTTGATGAAACAACATCATTCAGTTAGAAAATTTAAGAATCAACC 
ACTTGGTTCTGAAACGGTAGAAAAATTAGTAGAGGCGGGACAGAGTGCTTC 
TACATCCAGTTATCTTCAAACTTATTCTATTATTGGTGTTGAAGATCCAAG 

45 CATTAAAGCGCGTTTAAAGGAAGTGTCAGGTCAGCCTTATGTTTTAGATAA 
TGGTTATTTATTTGTATTTGTTTTAGATTATTATCGTCATCATTTAGTAGA 
TGAAGTTGCGGCGTCAAATATGGAGACATCATATGGTTCTGCAGAAGGACT 
ATTAGTAGGTACAATAGATGTTGCATTAGTTGCGCAAAACATGGCAGTTGC 
TGCCGAAGATATGGGGTATGGAATTGTTTATTTAGGGTCATTGCGTAATGA 

50 TGTTGCGCGAGTGCGTGAAATTTTAAATTTACCTGATTATACGTTTCCGTT 
ATTTGGTATGGCAGTAGGTGAACCTTCTGATGAAGAAAATGGGTCACCTAA 
ACCGCGCTTGCCATTTAAACATATTTTTCATAAAGACCAGTATGATGCGAA 
TCAGCATCAACAACGTAAAGAATTGGAAGCATACGACCAAGTAGTGAGTGA 
ATATTATAAAGAACGTACTCACGGTGTGCGTACAGAAAATTGGTCACAACA 

55 AATAGAAACATTTCTAGGACGTAAAACACGTTTAGATATGTTAGATGAATT 
GAAAAAAGCAGGATTTATTCAAAGATAAAAAGAACCTGAGACAGAAATACA 
TGTCTCAGGCTAGGGTGGGGCGATATTTTCAACACGAATGTTGCCCCGCTC 
TTTTTATAATTGTATGTCGTAATGATCAATGACCACTTTTCTTACATTTAA 
TTTGCAGATAAATCGCCGTAATCATTTGAGTCAAATGTTTTTTTGTCCC^ 
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TGATTCGTTAAGCGTGCGGTACCTGTTCCAGCTAGCATTGAGTCATTAACG 
TTAAGTGCTGTACGACCCATATCGATGAGAGGTTCGATAGATATCAGTACC 
CCTGCGAGAGCAACTGGTAGATTAAGTGTAGATAATACGAGTATTGATGCG 
AATGTTGCCCCGCCACCTACACCTGCAACGCCAAATGAACTTATAATAACA 
5 ACAGCAATAAGTGTAACAACAAATTGGAAGTCAATTTCTACATTTGCTACT 
GGTGCCACCATAACTGCTAGCATAGCAGGATAGATTCCTGCACAGCCATTT 
TGCCCTATGGATAAACCAAAAGTTGCAGAGAAGTTTGCAATTCCCTCAGGT 
ACACCTAAACGTTTTGTTTGCGTTTGAACATTTAACGGTAATGCACCTGCA 
CTTGAACGTGAAGTAAATGCAAAGATTAGTACTTCTATTGTCTTTTTCACG 

10 TATTTAACGGGATTGATACCTAAGACACTCAGTATAATTAAATGGATAATA 
TACATTGTGATTAGAGCTGCGTATGAAGCAATTAAGAATTTACCTAACGTC 
CAAATTGCAGAAAAATCACTTGTCGCAAGAGTAGAAGCCATAATAGCTAAA 
ATGCCATAAGGCGTTAATCGTAAAACAAAAGTTACGATAGCCATAACGATA 
GAATAGATTGCTTCTATACCACGTTTAAGTAAGCTTCCATGTTCCGGCTGT 

i 5 TTTCTTGC AACTCTAAGATAAGC AAAGCCCACAAACGTTGC AAAAATAAC A 
ACTGCAATTGTCGAAGTTGTACGTTGTCCTGTGAAATCTAAAAATGGATTG 
CTTGGGAATACTTCGAGAATTTGTTGTGGTAAAGTGTTTGCAGTTAAATCT 
TTGGCTTGTTTTGAAATTTCTGTACCACGTGAATGTTCTGCACTACCTAAA 
TCAATAGACGATGCATCTAAACCAAAGATCAAAGCGTAAAAAATTCCAACG 

20 ATAGCTGCAATGGCTACAGTACCAATTAAAAACATAAAAATATAAGAACCG 
ATCTTAGCGAATTTTTCACCAATTTGTATTTTGCTAAAAGCGGCAACAATT 
GAAATGAATATTAGTGGCATGACAATCATTTGTAATAGTGCAACATAACCA 
TCTCC7VACAATACTAAACCAGTCTGTTGATTGTTCGAGAGTTTTAGACTCT 
GCACCATATATGAGGTGTAATACAATGCCAAATACGATACCTATTGCTAAA 

25 GCGGTAAAGACACGTTTAGGGAATGAGACGTGTTTTCTTGCCATAATATTT 
TUVCATTATTAGGAAAACAAGTAACACGATGACATTAATTACTGTAAAAAGT 
ATTTCCATATCGATATATAGCTCCTTTTTAATTTTTAGTATTCCGACATAT 
GTAATAAGAATAAAGGGGGTTAAGAAGAAATTCAAGTTGTTGATTTATAGT 
TTTCAGATTCATTTAAAGTTGAATAGAGTGAAGGTGATTACTAGAACAATG 

30 AAAAATGAGAAGTTTATGTTTATCAGAAAAGTGAGCAAAGAGGACGTAAAG 
AGGTTTATATTAATAAAAAAGACGCGTGCAATCATGCTATAAGAAGACATG 
ATTGCACGCGTTTTTATTTAATATTATGAATAAACATGAGTCAGCTATGAA 
TACAAATCATTTATAAGTTAAATAGACGTGCAAAGAAACCTTTCTTCTCTT 
CTTTAGGTTGTTGTTGGTAAGTCGATTGAGAAGAAGGTGTAGCTGATGTTG 

35 ATTGAGAATGAGAAATTGACTCAGATTTTTCTGAGTTAGATACACTTCCTG 
ATTCTGACAAGAACTCGCTTGTAGACGTACTTTTTGATTCAATTTCCGATT 
TGCTATTTGTATTACTTTCGGATGTTATTGAGTTAGACTCGGACAGAGATG 
AACTCATTGAATCATTTGCTGATTCAGATTGGCTCAAACTTGCTGTAGTAC 
TTTGACTTGCGACGTTGCTCTGAGATATTGAAGCGCTATTTGACATTGAAT 

40 CAAAATGACTACGTAATTCAAAATATTACATTTAATAAAGATATAACACCT 
AATTACGTTTGTACGTTAATAAATGAAGATTCCAGAAAAATAATTTTAAGT 
AATGTAGACAATCCGAATATCATTATTGAAATGATTCTTATTAATTCAAAA 
AAATTGGTTTTTAATGCTATAAGTAAAGAAGGCTTAGGTACCTCGCCTAAA 
ATCACATTTACAAAATAATTTATACCCTGTAACTCTTTCATTAAAAAAACA 

45 ATATGATATAACTAAAAAGCTCCCAGTAAAGGTCACATTTTAAATGCT 
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50 ATAAGCTGTTGTGGCAGCAGCAAATGTGCATCCTGCACCATGATTATAACT 
TTGTTGGAACATGTCAGTAGTTAATTGATAAAATTGTTGGCCATCATAGTA 
CAAGTCATAAGATTTATCTTGATCGAGTGCTTTACCACCTTTAATAATGAC 
ATGAGGTGTGCCTTTGTCATAAATCACTTGAGCAGCTTTTTTCATATCCTC 
AATTGATGTTAATTTTCCTAAACCAGAGAGTTGACCTGCTTCGAATAAATT 

5 5 CGGGGTAACAACTGTAGCTTTAGGTAGTAAATATTGAATC ATTGCTTCTGT 
GTTTCCTGGGTTAAGTACTTCGTCTTCTCCTTTACAAACCATTACTGGATC 
AACTACAAAATAGTCTGCACCAGATTCAACAAAAACATCTCCGGCACGTTT 
AATAATGTCTTGTGTCCCTAACATTCCTGTTTTAATAGCATCAGGTCCAAT 
TGATATTGCAGTTTCAAGTTGTTTTTCGAAAACATTCATATCAATAGGTGT 
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TACATCATGGGACCATGTTTCTTTATCCATTGTTACAATAGATGTTAAAGC 
GACCATTCCATATACATCAAGTTCTTGGAACGTTTTAAGATCTGCTTGCAT 
ACCGGCACCAGCACTTGTATCTGAACCAGCTATCGTTAATACTTTTTTTAA 
AGCCATCATTCATTCACTCCCATTAATTTCTAGTGTCTTTATCATATCATG 
5 TTTATCGCGTACGCTAAATTATTATAATTTTAAAATGCAAATCAATCATCA 
TACTTAGCTATTGTATTGATGTTATCTTTAATTCGTGATTAATAGGTAAGA 
ATGTCATCTCATTCTTACTTAAAAGGTCTTCAGTTAATTTTATTTTGAAAA 
TTTGATTGCTAAAGTAAGGCTTAATATAATAACTTCTTTCTTTACTTTCCA 
TCACTAATTGATATTGTGTGTAATGTATTTTATTGGCATCGATAACTGCAC 

1 0 CTTG AGGG AT ACTG ACTG ATTCTAG AACTTTT^AAAC AATTC ATTAAATTTT 
CATCTTCATTATGGGAACAGCGTAGTTGGTGTCTTAAATATGTAGCGCGTA 
TAAAACGATCTGTTGACGTATAACCACCCGGTAAGCCATTTGTTCCTGCTT 
CACAGCCCATTGATCTTACTAGCACTTTACCTATTAATTGATTGGTTGATT 
TCTGTGGCGTTAAAAAAGCGTAATTTCTTAAGTTAGATAGATGCCAATCTA 

1 5 ATTTAGGTTCATTTGTTAAGGTATGAACATAATTATCTTTAACTATTAATA 
AGCCATTGTGAGGTTCTATGGCTACGGTATGTCCTGTTTCATCAGTGACCA 
TGAAATGTAAAGGAGGAACGATATTCAAAGTCGTATTTTTTTCATTCATAA 
TATTGATTTTCTTAACCTTTTGTTTTAATTCGCTAATACTTTTATTAAAAC 
CTAAAACCCAAACAATAAACTCCTCAGGTGCTAAGTTAAAATAACCATAAC 

20 GTTTATGGGTACTGTATGAGGCTTCACCAGTGAAGTAATGGTTCGAAATAG 
CTAAACCTTTTTCGTTTATACCATCACCAAATCTATAACGTCCTACTTTTA 
AATTTGTTCCAACAAAACCATATTCAAGACGCATGTCTGAATCTAGATCAA 
ATTGGTAGTGATAATGGCGTGGAACAATCGTTGGGATACCATTAAATTCAA 
ATGCAAAGTCCATTGTTCTAGCTAAATAATGGTAACGTTGTTTTGTATATA 

25 AAGAAATGGCAGTACACATATATAACACTCCAAACTACTGAAATTTATCTA 
TATTAATTGTAATATAGATAAAAATGCGCTTATTGTAAACCTGAATCATAT 
AAAAGTTTACAAAAATTAATGCTTATAAAATCAGTCATTTTTCTAATATGA 
ATATGTTAAAATGTAGCTTGAGGTGAAAACGAATGAAATGGTCAGAGGTAT 
TTCATGATATAACAACGCGCCATGATTTTCAGGCGATGCATGACTTTTTAG 

30 AAAAAGAATATACGACTCAAACCGTCTATCCAGATATACAAAATATCTATC 
AAGCATTTGATTTAACGCCGTTTGAAGATATCAAGGTTGTTATTTTAGGGC 
AAGATCCTTATCACGGTCCTAATCAAGCACATGGTTTAGCATTTTCAGTGC 
AACC TC ATGCTAAATTTCC AC C ATCTTTAAGAAATATGTATC AAGAACTAG 
AAAATGATATAGGGTGTCATAGAACTTCGCCTCATTTACAAGACTGGGCAA 

35 GAGAAGGTGTCTTGTTATTAAATACGGTATTGACTGTTCGACAAGGTGAAG 
CACATTCACATCGAAATATTGGATGGGAAACATTCACGGATGAAATCATAC 
AAGCTGTTTCTAATTATCGTGAGCATGTTGTTTTTATTCTGTGGGGAAGAC 
CGGCTCAACAAAAGGAACGATTCATTGATACATCTAAACACTTAATCATTA 
AATCGCCACATCCTAGTCCACTATCGGCTTTTAGAGGATTTTTTGGTTCTA 

40 AACCTTATTCAACTACAAATAACTATTTAAAATCTAAAGGGAAAACACCAG 
TTCAGTGGTGTGAAAGTTAGGTGAGAATAATGAACAAAGAACAGATTCTAC 
AATTGATTGAGCAAGAATTGATACAAGCAGATGAAGCTCAGACAGATACGG 
AATTTGAAAAGCATATGTATGCTATACACATGCTCACATCTCTTGTTAGTT 
CTCATCAAAGTCGTTCTACAATAGAGAAATTAAATCATTCTAAACCAATGA 

45 ATAGTAATATCAAAGATGATTATGAGATGAAACAACAGTCTTCACAAAAAC 
ATCATGTAACTGCAGCTGAAATAGAAGCAATGGGTGGTAAAGTACCACAAT 
CAATGAAAAAGCATCATACTTCTAATAATATGATGATTACAGATGATCAAG 
TTGGTAATGGTGAATCTATTTTTGATTTTTAAATATAAAGGAGTACATTCA 
AAATGATGAAAGTTTTT 

50 

Sequence 3346 

step . 1001a09 . cons . ok 

CTTCTTAAATTACGTAATTTGTATCCAATATCCATCTTTCTAACCACCTAA 
55 AAAAGCTATATCATGTTTACTTATATCAAACATTTTGTTTAATGCTTAATA 
AAAATAACAGAATATAGCTACGTAATCAATAGTTTTATATTAACTTATTAT 
TGTCGATTTAATAGAAAAAGTTAGTCGAAATCTTTACTAAATAAAGTTTGA 
TTTGGATTTTTTTTAATTTCTTGAACTACTTTGTAACTATTATAACCACT^ 
TTATTTTCAAATTCTTTAAAAATCTGTTTTTCATCAGCTTTTCCAGGCACA 
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ATATGTTTAAAATTTTTATAAGCATTTAAAACTTCGTCACCTTTTACAGAT 
GCCTCATAATAATTTTCGATTTTATTAAAGAAATGAATTACTTTCATCATT 
TCATCATTAGTCCAATCTAAATCTAATGGATATTGATATTCCATAGTGTCC 
TCTCCTTCATATACTTTTATTTTTCACATTTAGATATATTAAAAATAACAC 
5 AGTATACCTTTATTGGCATAAAAAGATGGGATAAAATCCATTTTTGAATTC 
TATCCCACCATGAATAAATACCCCTAGCAGATTGCTATTATTTAGAGTAGC 
TAAACGCTAAGGGTATTTTGAAATGATTTAAGATACGTATGTCTTTGCCAG 
TCCGTGACTATACAAACGAAAAAGTAATATACGTACAAATCATAAATAAAT 
CATATTAATTTATATACATTTATCATTATTAACTAGTAATTACATAGTATG 

1 0 AATTGGATAACC AATTGCTTTTTCAGCAGCTTCCATTGTCATTTCACCTAA 
AGTTGGGTGTGCATGTACAGTTAATGCGATATCTTCAGCATTCATACCTGA 
CTCAATAGCTAAACCTAATTCAGAGATAATATCAGATGCGCCAGTACCTAC 
AACTTGTGCTCCAATAAGCGTATCATCTTCTTTAAGTGTAATTAACTTAAC 
AAAACCATTTGTATCATCTAATGATAAAGCTCGTCCATTAGCTGCATAAGG 

1 5 GAATTTAGAAGCTTTAATTGATAAACCTTCTTCTTTTGCTTGAGCTTCAGT 
ATAACCAACTTGTGCTAATTCTGGTTCTGTAAAGCAAACTGCTGGCATACC 
AATATAGTCTACCTCTGCGGCTTGACCATCTATCGCTTCAGCAGCAACTTT 
ACCTTCATAACTAGCTTTGTGAGCTAATGGTAATCCAGGTACAATATCTCC 
AATCGCAAAGATATTTTCAATAGAAGTACGACTTTGTTTGTCCACTTCTAG 

20 TAATCCACGATCAGCAAATTTCAGACCAAGTTCTTCTAATCCTAATTCATC 
AGTATTAGGGCGACGGCCAACTGTAACTAATACATAATCAGCTTCGATAGT 
TTGTTCCTCACCTTTTGCCTCATAAGTTACTTTGACACCATTTTCAGTTTC 
TTCTGCAGATTTTGCCATTGCTTCAGTAACGATTTCGATACCTTTTTCTTT 
CATACCTTTTTTAACAGGTTGTGTCATTTGCTTTTCAAATCCGCCTAAAAT 

25 ATCTTTTGCACCTTCAAGGATAGTAACTTCAGAGCCAAAGTTTGCAAAAGC 
AGTACCTAATTCAGAACCGATATATCCGCCACCAACTACAACTAGTTTGTT 
AGGTACTTCTTGTAGATTTAAAGCTCCTGTTGAATCGATAACACGTTTACC 
AAATTCAAAATTTGGAATTTCAATTGGTCTTGAACCTGTAGCTATAATCGC 
ATGTTTGAAATTGTAAGTTTGAGCACTCTTTTCGTCCATGACACGTAAACT 

30 ATTGTTATCAACGAAATAAGCTTCACCTCTAACAATCTCTACTTTGTTACC 
TTTTAAAAGTCCTTCAACACCGCCAGTTAATTTATTAACTACAGAAGTCTT 
GAATTCTTGAACTTTTTGATAGTTTAACGAAACGCTTTCAGCAATTACCCC 
TAAGTTTTCTGAATTTTGCGCTTCAACAAAGCGATGAGAAGCATGTAGTAA 
TGCTTTTGAAGGTATACAACCAACGTTTAAGCATACACCACCTAAATTACC 

35 TTTCTCAACGATTGTTACCTTTTGTCCTAATTGAGCCGCGCGAATGGCTGC 
GACATATCCACCTGGACCTGCTCCTATTACAATAGTATCTGTTTCAATTGG 
GAAATCTCCAACTACCATGTTTTTACCCCTCCATTAATAATAATTCTGGAT 
TATTTAATAAGCGTTTAATGTGATTCATAGCATTTTGTCCAGTAGCACCAT 
CGATTTGTCTATGGTCAAAGCTTAATGATAAAGCTAACACTGGTGCAGCTA 

40 CAATTTCTCCATCTTTAACGATAGGTTTTTGAGCGATACGGCCAATTCCTA 
AGATAGCTACTTCTGGGTGATTGATAACTGGAGTGAACCATTGTCCACCAG 
CGGAACCGATATTACTAATTGTGCATGTTGCACCTTTCATTTCTTCTGAAG 
TTAATTTACCATCACGTGCTTTTACAGCTAGTTCATTAATTTCATCAGAAA 
TTTCGAATATTGATTTACGATCGGCATGTTTAACTACTGGTACTAATAATC 

45 CTTTATCCGTATCTGCAGCAATACCAATATTCCAGTAATGTTT6TGTACAA 
CCTCTCCAGCTTCTTCATTGAAAGAAGTATTAAGTGCTGGATATTTTTTAA 
GTGCAGAAACTAATGCTTTAACAACATATGGTAAGAAAGTAAGTTTTGTAC 
CTTGTTCAGCAGCAATTTCTTTAAATTTCTTACGGTGATCCCATAATTCTT 
GCACATCAATTTCATCCATTAATGTAACATGAGGTGCAGTGTGTTTAGAAT 

50 TAACCATTGCTTTAGCAATTGCTTTGCGCATTGCAGGTATTTTTTCTGTAG 
TTTCAGGGAAGTCGCCTTCTGGTAATGCTTGTGTTGCAGAAGCATTAACGA 
CATCACTAGAAGTTGATTCAGAT6CTGCGCTAGTGTTTGAACCTTCTTCGG 
AACTACCACCATTTAAGTATGCATCGATGTCTTCTTTTGTGATTCGTCCAT 
TTTTACCAGAACCATTTACAGCTTTAATATTGACACCATTTTCACGTGCAT 

55 ACTTACGCACTGACGGCATCGCTTTAACAGTTTTACTTTCATCTACTTCTG 
TCTTTTCTTGTGATTGAGTTGATGAAGCTTCTTCTTGCACTGGTGATTCTT 
GTTCTTTTTCTTCTTTCTTAGAATCCTCATCATCGCCATGACCTTTAAATT 
GCATTTCTTCTGCATCAGGTGCATCAATTTTAACGATGACATCTCCTACTA 
CTGCCACTGTTCCTTCATCTACTAACACTTCTTCAACAGTACCACTTACTG 
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GAGAAGGAArrTCTACTACAGATTTATCATTTTGAACTTCTGCTAATACAT 
CATCTTCTTCAATTGTATCGCCGGCTTTAATAAACCATTTAACAATTTCAC 
CTTCGTGGATACCTTCCCCGATATCGGGTAATCTAAATTCAAATGCCACGT 
TCTTGTCCTCCTAAGATTTACGTATTTAAATTT 

5 

Sequence 3347 

step. lOOlbOB .cons .ok 

AATTTTAACCTCTCAACAAAAAGTCGTGATGGAACAATTTTTTACACCTAC 

1 0 ACAAAAAGTTAAATTTCAAAAATATTACAACCCTGAGCACGAACATCCTAC 
GGTACAATCTATCATTTATAATACTTCACGAGACGTTCGTTTTTTCAACGA 
TGAAAATGAACTTTTAGCGTTTGCAATTAATGCGCTATATCATTTAGGAGA 
CGTATTTTTATGTGATAAAAACATCGTTACAGGGCCAATCATTGATCAAAC 
TGATACTAAAATACCAGTTCTCGCCGTTTTCCACAGTACCCACGTAAAAAA 

1 5 TATC AATGATATATATCATTCTGAAATC AAACAAGCGTATAAACCTGTTTT 
AGATAACTTATCCCGATATTCAGGAATCATAGTATCTACTGAACAACAAAA 
AACAGATTTATCTGTAAAAATTAATAACGTTATTCCCATTTACGTTATACC 
TGCAGGTTATATTGATACAAATGAATCTCATCATAGTAGTGACAATAAACC 
ATTGCCTAACAAAATGATTTCTATCGGCCGTTATTCTCCTGAAAAGCAATT 

20 AGATCATCAAATAGAACTAATGTCTAAGCTAGTTCCAGCATTTCCAAATTT 
ACAGTTACATTTATTTGGGTTTGGTAAAGAAGAAACACATTATCGTAAATT 
AATAGCTCAATATCATTTAGAAAATCACGTGTTTTTACGTGGCTTCATTTA 
TGATTTAAATCAAGAAATAGAGACCGCCTATTTATCTTTATTGACAAGTAA 
AATGGAAGGGTTCAATTTAGGTGTACTTGAAACGATTGCTAAAGGCGTACC 

25 TACAGTAAGTTATGATACCAAATACGGACCTTCTGAGTTAATTGTTAATCA 
TAAAAATGGCTTTTTAATTGAACAAGATAATAAAGAACAACTCTATCACAG 
CGTTAAAAAGTTATTACTCGATTCTAACTTAAGAGAACAATTTTCTAAGGA 
AAGTATTAAACATGCCCAAATATTTAATGACAAAAATGTTTTTGATACATG 
GCTCACTGTTTTCAGAACGTTAAAAGTTAATTTATAATCGCCAATTTATTA 

30 AGATATATAAGGGTGAAGAGAAGTGTGAAACATCATACATCATCTTCACCT 
TTAATTTATTCTATAATATTTCATCTTTAATCTGTGCCTTTTGATATTGAA 
CATAGCCTGATATATAATGTTCGATATCATCAATCCGAACTCGATACCCAT 
GGTGTTTTATGAAGAAACTACCAAGAATTCGCTTTGGATTTTTGAAGTACA 
TTGCAATTTCAGGATAGAAAAATCCTACTCGTTGATATTCTGCTCTTGTAT 

35 GAATCACATTGATTAATTTTTCTTCATCAATCAACTGAGTCACAAGCTCTT 
GACGATGTGTAGCTTTTGCTTTTTGAATAAGCTTATAGGTTGCCATTAACA 
TTTCTAAAAATGTTGGGAATGTTGTTTCACGTTGTTCAATATATTCCAAAT 
AGGTGTTCACATTCTTAATCCCAAATTCAAAATATTTATCTTGCGGACACA 
ATTGAACTAACTCGTTCGTGCAATACCCAAGCCAATGATCATGGTATTGCC 

40 AATACTCTTTTTCAATAAATCGATCCATTAACTTCTTCACAACTTCCAACC 
ATTTATCATTATGATCTTGGTGATATAAGCGTAATAATGCTAAAGCTGCTT 
CACCATCATAATAAATAATTCTAAATGATTCTTTCACAGTTAAATCCGGAT 
AATTTAAAATATGAGTTGTTTCGTATGTATCTTGATTAATCATTGATAAAA 
TTCCTCGAGCAACTTTTTGTGCCACGCATAAGTATTGCTTATTGGGGTTAT 

45 GCTTTAAATATTCACAAACCGCAAATATAAAGGCAGCATTTTGTCCTAATT 
TTATTTCGTTAATATCTTTTGTATCATCAAAGATATATCCAACACCTTCAT 
TATCATAGAAATAATTCTCAATAACGTAGTTAATTGCCTTTTCGACTATAG 
TTAAATCTTCTCCTAAATAATCTAAACCCTCTATTAATGCATAAGTTGAAG 
AAGCATGTCTTAATATATTATAGAAATTGATTTCTTTATCAAAATGTGGAA 

50 AATAACCATATTGATATCTTCCAGTATCTGATAGCATATTTCCTAGGAAAT 
ATGTACCACTTTCAATTAATTGGTCTATTTCTTTATGTAAATAATCGACCT 
TTCTTAATCCTTTTTTATAACCTTCATCGTGTAATTCATATATCTTTTGTT 
CGTCTAAGATAAAACCTTTAGTTTTGAACTTAATGACTTCTTTGTTTTCAT 
AAAAATCATAAGCAAACTTTTTCTTATGGTTCGTATACTTTCTTAAATAGT 

55 TATTTATATTTTGTTCAGATAGAATAAGCTTCGTTTTCCCATCTGTTTTCA 
CTGGTTTAATAAATGCATTAGTGTTTATTTCTTCAGGTAAAAATGATAAAT 
TCCAGTATTGATCTAAAGCTATACCAAAATCAATATT^TTTCTTCTAGTTT 
GCGTTAATTCATCTTTAACATCTTTAAATAAAGTAACTTCTTCTTCAGTAA 
CAATGTCTATTTTTACCCATGAAGGAAACGCTCCTGTTTTCTGTCTAAACT 
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TTTGAATTTCTTTAGTAATATCCCTTCTAAGGTAATTAGGCTTTTTTAATA 
ATTTAACTTTGGCTTTTATATTAGATTTTCCAAGGCTTAAAAAAGCATATT 
CTTTTATTTCTATTGTTTTTTCAACATGATTAATCAATCTTTCTATATGAT 
TTTCCATACTCATCTTTTTCTCCTTTCAATTTAATATGATTTCAAACCAGG 
5 CCATATAGACCTGGCTTGAAATAACTTATGAATGAATCAATAGCGCTATTT 
TTCTTTATTATCTTTTTTACGACGTCTTCCTAATAATAATGCTCCTAAACC 
TGCAAATAAAGTTCCAAGTAATGTGCCTTTAGAATCATGATCTTCATTTGC 
TCC 

10 

Sequence 3348 
step.l001b04 .cons .ok 

CGTGACTATATGATGGTAGAAATGTGTCCTCGTTGTAAAGCGCATTTTAAC 
AGTCGTTGTAAATTTCATTATCATTTATATTTTGAAGTATAAATAACTTGT 

1 5 TACAAAGTTTTTGAAATATTTGAGCTCCTAAAGTGAATTTTATTTAAAAAT 
TCGCTTTAGGAGCTCTTCTTATATCAACATTCTATTCATGCATTTAT^GTT 
TTCTTCTTATACCTCGGTTGATTTTCTCTTATTATAAGATTATTTTTCAAC 
TTTCTCTACTACTTCATTTAATTCATCAATTTGTTTGATAGTTGTCGTTGA 
AGATCCTGAAGAGAAATACCATAGTTTTGGATCTAATTCGTAAATATGATT 

20 ACTTTTTACTGCTTTTACATTTTTTATAACTTTGTTTTTTAAAACTTGATT 
TGTTGTTGCTTTACCACCTACAACTGAACCACGATCCATAGCTAAAATAAC 
ATCTGGATTCTGCTTGTTAATATATTCATTATTTATATTTTGACCATGCGG 
GCTTTTGCTAACCTTTTTGTCTGCAGGTTTAAATCCTAATGTATCAAACAC 
TAAACCACCAAATCTTCCTCCTGGTCCAAACGTTGATAGTTCACCTTCGTT 

25 AACCAATAAATACATTACTTTCTTATTAAAGTCTTTAGTTTTATCTTTCAT 
ATCAGATATTTTTCTATCTAAATCTTTATTAATTTTTTTAGCTTTATCTTC 
TTTATCGTAGATTTTCCCTAAATTTTCTGTATTTTTTTTCATATCTTTAAT 
TAAGTTGTCATCACTTGTACCTACATATACAACTTTAGCTTTTGGTGCAGC 
TTTTTTAAATTCATCTAAATTTTTCTGATTAGCTGTTCTTCCTGAAATAAA 

30 AATCACATCTGGTTTAGCTGATGCAACTTTATCAAAGTTCACTTCTTTTAA 
ATTTCCAGTATTAATATACTTATCATCTTTAAATTCATCTAAAAATTTAGG 
TAAAGATTGGTTATTTTCACCTTTAGGTAAACOTTTTACTTTATCAGCCAC 
ACCTAATTCTTTCAACACATCAAGCGCTCCATAATCTAATACAACGGCATT 
TTTAGGATTCTTTGGTACTTCGACAGTATTAGAGATTTTTTTCTTATCACT 

35 GCCATTATTTTCTTTACCACTTGCTTCAAAACTATTTTTGATGGTTACAGT 
TTCTTTAGAATCACTATTTTTCTTTTTCGAAGTTGAATTATTATTCGAACT 
ATTACTACAAGCCGTTAAAACTAAAACTAGAGACAATAATAAAAATAAGAC 
TGTTTTTTTCATATGTGCAACTCCCTAATCAAATTTATAATTTCTTTTCGT 
ATGGATAACCTCTTTTACTATGCGTCTCAATTTTAATCATTAAACTGAGTC 

40 AAAAGTAGTTTCATCATAATATAGACAAATACGTTGTCCCCTTATCTCCTC 
TATACGTACTTCCATTTCATATAAACTTTTTAAAATGTCAGACTGAATAAC 
ATTATCTTTATCATCAGCTTTAACTAGCTCACCTTGTTTAAGCGCAATGAT 
GTCGTCTGAATAACAAGAAGCAAAGTTGATATCATGTAATACGATAATGAT 
TGTTTTATTTAACTGACGACATAAATCTCGTAAC6TTTGCATAATCTGTAC 

45 TGAATGTTTCATATCTAAATTATTTAAAGGCTCGTCTAATAATATATAATC 
AGTATCOTCTGCTATAGTCATAGCAATATATGCACGTTGTCGCTGACCTCC 
AGATAGTGTTTTAATATTACGATGTTT7U\TTTCATTAAGCTGTAATAAGTC 
AAGAGCATATTCCACTTTGTCATAGTCTTCTTGCTTCATTCGACCTTTTGA 
ATATGGAAAACGACCAAATTTAACTAGTTGCTCGATAGTTATGTTCATATC 

50 AGTGTGATTAGATTGTTTTAAAATACTTAATTTTTTAGCTAAATCGTCAGT 
TTTATATTCTGAAATTTCTTTGTCTTCAATCAAAATTGAACCGTTCTCATA 
ACTAAATAAGCGACTGATAGCAGATAATAATGTACTCTTACCAGCACCGTT 
AGGACCTATGAGTGAAGTCAATTTTCCCTTTTTAATATCTACGCTAATATC 
TTTTAAAATAGGTTTATTGTCTATCGTTTTATCTAAATTTCGAATTTGAAT 

55 CAATTTGCATTTCTCCTTTTGACAAGCAAATATATGAAGTAACTTCCACCA 
ACTAAATCTACTATCAAACTGAATTCAGTTGTTGCTTCGAATAGATTTTCA 
ACTACCCATTGCGCTATAAATAAACTAATCCAGCTGAATAATATTGTTGCC 
GGTAAAATAAATTTGTGTTCATAGGTCTTCATAAATTCGTGTGCTAAGTTG 
ACAGTTAATAAGCCAAGAAATGTTACAGGACCTATTAATGCAGTAGATATA 
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GAGACGAGTAAAGCAACCAAAATAAGAAACATACGCGTCATATTCTCATAA 
GAAACACCTAGATTAATTGCTTGTGCTCTACCTAACAAAAGAACATCTAAA 
TATGGTCGAAGTATTATTGTAACAATTATTAATATAACTAATAATATGCCT 
GAAACAGTTACTAATTTAGAATTTGATGCTTCAAAGCTAGCAAACATAACA 
5 TTTTGAACTGCTAAAAAAGATTCAGGATTCATTATAAGTTGGAGAAAACTT 
GTAATACTACGGAAGAAAGTTCCTAATATTACACCAACTAGTAATATGAAA 
TAGACTGAAAAGTGACCTAGTTTAAATATAACTTGAAATAATAGCAATGAA 
AAAAACACCATTGCGATCAAAGTGATTAAAAAATTAAGGTAGATATTTGTA 
ACAACAGTTGCTTGTTCACCTAAAATAAAAATAGGTAAGACTTTGACAAAT 

1 0 AAATACACAGAATC AAGTCCCATAATAGAAGGAGTTAATAATCTATTTGTA 
GTAATAGATTGGAAAATAACTACGGACGTTCCTATCGATGCTCCTACTAAT 
AATATAAGAATAAATTTTTTGAAACGGCTTTGGAATTGATATTCAAATATA 
TCAAAATCTAAACCTACCAATAAATAAAAAATTGCCATACAAATTGCCATA 
ACGACCAGTATACTTATTTTTTTATTAGCGCTCATTTGCATAATTTTTCCT 

1 5 ACCTTTCATTAGCAAGATTAGGAAAATAATTGTGCCAAATAC 



Sequence 3349 

step . lOOlblO . cons . ok 

20 CAATGGACTTTTATCGACTTGAGTTTCAGAAGCAAATATAAAAAAAGAACC 
TCGAGCCATACCTATTGTTCCAATAAATAATAAAGAAATAACATTAATATA 
TTCCATTTCAGTCATACGGTGTATTTTCTTCATGTTTCTACCCACTTTCTA 
TCATTAAGTGCAGCACAAATAGCATTAGCATAATAATACATATCTTCTGTC 
CAATTATGTGTATTTTCTCTATTCTTTTTACTAAAATCGTTTCGACTAAAA 

25 AATTCTTTTCTTTGTTGAGTCGATATTTCAATCTGTACACCCATGCCATAG 
GCATTTTTGTTGGTTATATTATTAATTTCTCTGCCCGCAATTCTGTCAGGA 
GCAGCTTCCACATTAAAACCTGATU^CTTTTAAATTGTGAGTAATAAGAGAT 
ATAAGTCTTTCATCCAATCCACCAATATAACTATTTGCTTGATTGCTCGAA 
TAACCATGTACGGCTATCGTTACATTCATAAATTGATTCCAATATAATAAA 

30 TTGGGGTTATCATAATTTGTTGAAGTGACGTGTAGAGTTCTATTGTTTTTC 
GGTTTTAAACCTTTAAAAGTAAAATAGTTTGCATTCGATAATTCTGCAACC 
AATAACGCTAATTCAGAAGTGCCACACTCTATACCGCCTCCGTGTATAGCA 
GTTATAAGTGATTTACTATTTCTATCTTGTGTCTCAATCATCCAATCTTTT 
TCATTCCTCACAAGTTCAGTCATAGATTTATAAGTATCCATCACTTTATTC 

35 ACCTCCTTTATCTATCATTCTAATTTCTCCGTAAATATAAGCTTCACGACT 
TACTGACCACTGATCGCTCTGAGAAATATAAACTTTAACATCTCCATTAGG 
TTCAATAGTTAATTGAGCACCACAAGCTTTAACAGGCACCGATCTTAGGAA 
CGCAGTTTGCGTACTTGTAATTAGTTCACTCGGTAACTTCGCTATAACAGT 
TCCACTTTTAAAGTTGTCAGCGTTAATTCTTAACATCACTTCCTTATAGTC 

40 TTGATGTTGAATGATTTTATAAGCGCAATTAAAACCATTTTGTCCTTCAGC 
TTTATAATGCCTATTTTTTATAGCTCCATTAATCAAATCATATTCAATCCA 
GTCTGAAGATTCAGGTAACTTTTCTTGTATTTCAGTAATGCTTTTACGATG 
TTCATCAATTGTGACGTATCCCTCTTGACTTAATATATTTTCAATATTCCT 
AACTTCGATTCTTAAATCTTTTAAATATTGACTTCCTTTTGTGTCAATTTC 

45 GTTGTCATArrCTTCATAAAGATTTTGAATTGTATTTTTAGCCGCGTTAAG 
AGATTCCTCGACGTCTTTTAAACTTTTACTACGCTGTTGATTAAGTTTGTC 
CAATCCTTTTTTTGTTTCAATCTGAATCTTTGTAATTCCTTCATCACTTGC 
ATCTTTCACTTTAACAACGTAATCTTCTAAATTATCTAATTGTTCTTGTAT 
TTCAGTGGCTCGTGCATTAATTTGTCTTTTTAATTCATCAAACATACGGAT 

50 ATATTTAATTTTAGTTGCACCATTAATTTTATTTATTATTGCGTCACCCAC 
TTCGAATTCAAATTCAGTTAAAACTGCAGTTGATGATTTATCGCTATCTAC 
CTGATTTCGATGGTTAATCGAAATATATATTTGACCTATAACAGTTGTATT 
CGTTGAAGCTTGTAAGAATTCAATAGGTATAGTTACTTCAATAACACCATT 
TAAAGGGTCAATAAAACGTACATTTTCTACTACTTGATTAGAACCGTTAGA 

55 GGACTCAAGATATATATACGTTTCAGTGTTTTCTTCACTAATTAATAAAGG 
CTTTTTGTTTTTAGTCACATAAAACCTTAAAACTGCAGTTTTATCATCTAA 
GTTATAAAAACCGATACCTTCATCAGATATCGGTTTTAAGTAAGGTTCATT 
TTTAACTTCAATTTTACCAATCTTTTTTAATTCCATTATTTTTCCTCCCCA 
TTTATAACATTTCTTAAATCATCATCATATATAGTATTAGGGTAAATTTGT 
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GTGAACGTATCTTCTTTAAGATTACCGTATTCTCCAGATTTTAATATCTGA 
ATGGAATTAGCAGAATGCGTAGGAGTGAACTTCGCAAATAATTTAATTTGT 
TTGATTGTTACAATACCAGCACCTTTTTTCTTAACATCTAGTACCGGCATT 
ACTGTATCAGTACGTTTTGTACCTGGTTCAGTAATCGTCGACATTTGAACT 
5 CCAGCAGCAGCGTAAATAGGGAAAGTTGTATTTCCTTTGTTGAGTCTATGT 
TCAATTGAAAATAAATTACGTTTTCTACTTTTATTAAATCCAAAGAATGGA 
TGATAATTCTGAACAATATTAGGATTAATCCCCACTGTGACATCTCTATCA 
ACGTTTATAGTCACATATCCATAGAGCTCTACAAATCCATTTGCTAAAACT 
TTGAAACGCTGCTGACTCATCATTATACGTTGCCATTCATTCCTTTCTGCA 

10 CGTAGGGTAATGACTTCAGATGTTTTATTGTTAAATCTATCATCATATACC 
ATCACTTTAATGAAAGGATCTAATGATGTTTGACCATTTTGATAAGATTCA 
AATTGTGCCATTCTAAAAATTACATTACCTACCCAGCGTACTGAACGTCTC 
ATTTCTTCGGGACTTTTCTTTTCACCGACACGTTTTTCATTTATTTCTGAA 
AGATATGACGTTGTTTTATTTCGGATTCCTACCCAATTACTAAAAGATGAT 

15 AATGTACTGGAACCCCAAACAACGAGGTCTCCGTTACTTTTTAAATTGTTA 
ATAAGATGAGTCATATTATTATTTTTTTGATTGGCCCAACGCGGATAAAAT 
AAACAGTAATCATTATTTACTGAAATAATATCATGCAAATCTAAGTGAGCA 
GTTAAATTATCTATACTTTGAACTAAGCTACGCATATTTTGTG 

20 

Sequence 33 50 

step. lOOlcOl .cons .ok 

CAAAGTGAGAGGCAACACAGGTACAATGGCTCTTAAATAAAGCAT/^TGAA 
CCCAGGAATATATCCGAATTGTCGAAATGCGTCAAACCATTCTTCTAACTG 

25 AGAAAGCGACAATACATCAAATCCTTTCTTAGGGTTGTATTACTTATTCAT 
TATAGTTGTAAATGATATTTTTT6AAACATTAAAATTAGGTCAACAACATA 
AATCAGACTTAGGTAGGAATTAATAAATCTATATATTGATATCTAATTCAT 
CAATTTTATTATTCCATTCTTTATCGAAACTCTAATGAATTGAAAATTTAA 
TATTTCCTATCTGAATTTGACTTCTTATTAATCTGACGTTAATCAATTTTC 

30 ATTTACTCTTATTAATACACATAATAAAAATGAGCCTAGATAATCCTAGGC 
TCATTTTTAAGTTCAACTTAAAATTATAAACGATTTTCTAATTCTTTTTTC 
TTATCTTCAAAGCCTGGTTTACCTAATAGCGCAAACATATTTTGTTTATAG 
GCTTCAACTCCAGGTTGATTAAATGGATTAACACCTAATTGATATCCACTC 
ATTGCACAAGCTAATTCAAAGAAATAAACAACATATCCAAATGTTTCTTCA 

35 TCTAACTGAGGAATATTTACAACGATATTTGGAACGCCACCATCGGTATGT 
GCAAGTAATGTACCTTCAAATGCTTTAGTATTCACTTCATCGATTGATTTG 
CCAGCAAGATAGTTCAGTCCGTCTAAATCATCTGCATCCTCTTCAATTTTG 
ATATCATGTTTTGGATGGTTGACCTTAACCACTGTCTCGAATAAGAAACGA 
CGGCCCTCTTGAACATATTGTCCTAAGGAATGTAAATCAGTTGTGTAATTC 

40 GCACTTGATGGATAAATACCTTTGAAATCTTTCCCTTCTGATTCACCGTAT 
AATTGTTTCCACCATTCGTTGAAATACTGCATAGAGGGTTCGTAATTAATT 
AACATTTCAGTAGTATAACCTTTGCTGTATAAAATATTTCGAATAGTTGCA 
TATTGATATGCGATATTTTGATCTAAATCATCAGAAGATAACTCTTCACGT 
GCCTTAGCCGCACCAATCATGATTGATTCAATATTGATACCTGCAGTTGCA 

45 ATTGGTAGTAATCCTACAGCTGTAAGAACAGAATAACGACCTCCCACATCA 
TCAGGTACAACAAACGTCTCATAACCCTCATTGTCTGCTAATTGTTTAAGT 
GCACCTTTAGATTTATCTGTCGTTGCAAAAATACGTTTCTTAGCTTCATCT 
TTTCCATATTTTTCTTCAACCAATTGTTTAAATAATCTAAATGCAACTGCT 
GGTTCTGTCGTAGTACCTGATTTTGAAATAACGTTAACTGAAAAATCTTTT 

50 CCTTGTAAATAATCAAGTAATTCTTTTGTATAACTTGAGGATAAATGATTA 
CCTACAAATACAATTTCAGGGTATTCCGTATTTGTTCTAAATGAAGATGTA 
AGCATCTCGATTGCAGCACGTGCACCTAAGTATGAACCTCCAATACCGATA 
• ACAACAAGTACATCGGAATTTGATTTGATACGTTTAGATGCTTCGACGATT 
CTAGAAAATTCTTCTTTATCATAATCAACAGGTAAATCTAACCAACCTAAA 

55 AAGTCATTACCTGCTCCTGTACCTTTATGAATAGTTTGATGGATAGTTTTA 
ACAATATCCTTTTGCTGATCTAGTTCATGCTTATCAAAAAATTCTAAAGTT 
TTGCCATAGTCTAATTGAATGTGAGTCATCTTATTGCCTCCTGTTATAAGT 
ACTTATACTAATTCTACTAAGTTTAGATGCGCTATTCAATTCTCAAGGCTA 
ACTTATCACAGGAAAATTATTTTTTATGTAACCTTTCATTCATAAAAAAGT 
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TATTTTTAATATCAAAGGTCACACCACCACTTTAGATATACAATTTACAGA 
AATTTTAAAATATATATAAAAGCTTTTATAACAGTATTTCAAACAAAAATA 
TCTAAATATAATCAGATTTAAAAATGAATATTTATGTAAAATAATGGTTTT 
ATTTTCATTATCAATCTGATATGATAGATGCATATTAATAATCATAAAAAA 
5 TTATTTGAGGTGT 



Sequence 33 51 

step. 1001d08 .cons .ok 

10 TCCCCTTCGGGGGACAGAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGT 
GTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTAAGCTTAG 
TTGCCATCATTAAGTTGGGCACTCTAAGTTGACTGCCGGTGACAAACCGGA 
GGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCTACA 
CACGTGCTACAATGGACAATACAAAGGGCAGCGAAACCGCGAGGTCAAGCA 

1 5 AATCCCATAAAGTTGTTCTCAGTTCGGATTGTAGTCTGCAACTCGACTATA 
TGAAGCTGGAATCGCTAGTAATCGTAGATCAGCATGCTACGGTGAATACGT 
TCCCGGGTCTTGTACACACCGCCCGTCACACCACGAGAGTTTGTAACACCC 
GAAGCCGGTGGAGTAACCATTTGGAGCTAGCCGTCGAAGGTGGGACAAATG 
ATTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGGCTGGATC 

20 ACCTCCTTTCTAAGGATATATTCGGAACATCTTCTACGAAGATGAGGGAAT 
AACGTGACATATTGTATTCAGTTTTGAATGTTTATTAACATTCATTTGTAC 
ATTGAAAACTAGATAAGTAAGTAAGATTTTACCAAGCAAAACCGAGTGAAT 
AGAGTTTTAAATAAGCTTGAATTCATAAATAATCGCTAGTGTTCGAAAGAA 
CACTCACAAGATTAATAACTAGTTTTAGCTATTTATTTTGAATAACAATTC 

25 AAAATATGGTGGAAACATAGATTAAGTTATTAAGGGCGCACGGTGGATGCC 
TTGGCACTAGAAGCCGATGAAGGACGTTACTAACGACGATATGCTTTGGGT 
AGCTGTAAGTAAGCGTTGATCCAGAGATTTCCGAATGGGGGAACCCAGCAT 
GAGTTATGTCATGTTATCGATATGTGAATTTATAGCATGTCAGAAGGCAGA 
CCCGGAGAACTGAAACATCTTAGTACCCGGAGGAAGAGAAAGAAAAATCGA 

30 TTCCCTGAGTAGCGGCGAGCGAAACGGGAAGAGCCCAAACCAACAAGCTTG 
CTTGTTGGGGTTGTAGGACACTCTATACGGAGTTACAAAAGAACATGTTAG 
ACGAATCATCTGGAAAGATGAATCAAAGAAGGTAATAATCCTGTAGTCGAA 
AACATATTCTCTCTTGAGTGGATCCTGAGTACGACGGAGCACGTGAAATTC 
CGTCGGAATCTGGGAGGACCATCTCCTAAGGCTAAATACTCTCTAGTGACC 

35 GATAGTGAACCAGTACCGTGAGGGAAAGGTGAAAAGTACCCCGGAAGGGGA 
GTGAAAGAGAACTTGAAACCGTGTGCTTACAAGTAGTCAGAGCCCGTTAAT 
GGGTGATGGCGTGCCTTTTGTAGAATGAACCGGCGAGTTACGATCTGATGC 
AAGGTTAAGCAGCAAATGCGGAGCCGCAGCGAAAGCGAGTCTGAATAGGGC 
GTTGAGTATTTGGTCGTAGACCCGAAACCAGGTGATCTACCCTTGGTCAGG 

40 TTGAAGTTCAGGTAACACTGAATGGAGGACCGAACCGACTTACGTTGAAAA 
GTGAGCGGATGAACTGAGGGTAGCGGAGAAATTCCAATCGAACTTGGAGAT 
AGCTGGTTCTCTCCGAAATAGCTTTAGGGCTAGCCTCAAGTGATGATTATT 
GGAGGTAGAGCACTGTTTGGACGAGGGGCCCCTCTCGGGTTACCGAATTCA 
GACAAACTCCGAATGCCAATTAATTTAACTTGGGAGTCAGAACATGGGTGA 

45 TAAGGTCCGTGTTCGAAAGGGAAACAGCCCAGACCACCAGCTAAGGTCCCA 
AAATATATGTTAAGTGGAAAAGGATGTGGCGTTGCCCAGACAACTAGGATG 
TTGGCTTAGAAGCAGCCATCATTTAAAGAGTGCGTAATAGCTCACTAGTCG 
AGTGACACTGCGCCGAAAATGTACCGGGGCTAAACATATTACCGAAGCTGT 
GGATTGTCCTTTGGACAATGGTAGGAGAGCGTTCTAAGGGCGTCGAAGCAT 

50 GATCGCAAGGACATGTGGAGCGCTTAGAAGTGAGAATGCCGGTGTGAGTAG 
CGAAAGACGGGTGAGAATCCCGTCCACCGATTGACTAAGGTTTCCAGAGGA 
AGGCTCGTCCGCTCTGGGTTAGTCGGGTCCTAAGCTGAGGCCGACAGGCGT 
AGGCGATGGATAACAGGTTGATATTCCTGTACCACCTAGTATCGTTTTAAT 
CGATGGGGGGACGCAGTAGGATAGGCGAAGCGTGCTGTTGGAGTGCACGTC 

55 CAAGCAGTAAGGCTGAGTGTTAGGCAAATCCGGCACTCATAAGGCTGAGCT 
GTGATGGGGAGAGGAAATTGTTTCCTCGAGTCGTTGATTTCACACTGCCGA 
GAAAAGCCTCTAGATAGATAACAGGTGCCCGTACCGCAAACCGACACAGGT 
AGTCAAGATGAGAATTCTAAGGTGAGCGAGCGAACTCTCGTTAAGGAACTC 
GGCAAAATGACCCCGTAACTTCGGGAGAAGGGGTGCTCTTTAGGGTTCACG 
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CCCAGAAGAGCCGCAGTGAATAGGCCCAAGCGACTGTTTATCAAAAACACA 
GGTCTCTGCTAAACCGTAAGGTGATGTATAGGGGCTGACGCCTGCCCGGTG 
CTGGAAGGTTAAGAGGAGTGGTTAGCTTCTGCGAAGCTACGAATCGAAGCC 
CCAGTAAACGGCGGCCGTAACTATAACGGTCCTAAGGTAGCGAAATTCCTT 

5 GTCGGGTAAGTTCCGACCCGCACGAAAGGCGTAACGATTTGGGCACTGTCT 
CAACGAGAGACTCGGTGAAATCATAGTACCTGTGAAGATGCAGGTTACCCG 
CGACAGGACGGAAAGACCCCGTGGAGCTTTACTGTAGCCTGATATTGAAAT 
TCGGCACAGCTTGTACAGGATAGGTAGGAGCCTTTGAAACGTGAGCGCTAG 
CTTACGTGGAGGCGTTGGTGGGATACTACCCTAGCTGTGTTGGCTTTCTAA 

10 CCCGCACCACTTATCGTGGTGGGAGACAGTGTCAGGCGGGCAGTTTGACTG 
GGGCGGTCGCCTCCTAAAAGGTAACGGAGGCGCTCAAAGGTTCCCTCAGAA 
TGGTTGGAAATCATTCATAGAGTGTAAAGGCATAAGGGAGCTTGACTGCGA 
GACCTACAAGTCGAGCAGGGTCGAAAGACGGACTTAGTGATCCGGTGGTTC 
CGCATGGAAGGGCCATCGCTCAACGGATAAAAGCTACCCCGGGGATAACAG 

1 5 GCTTATCTCCCCCAAGAGTTCACATCGACGGGGAGGTTTGGCACCTCGATG 
TCGGCTCATCGCATCCTGGGGCTGTAGTCGGTCCCAAGGGTTGGGCTGTTC 
GCCCATTAAAGCGGTACGCGAGCTGGGTTCAGAACGTCGTGAGACAGTTCG 
GTCCCTATCCGTCGTGGGCGTAGGAAATTTGAGAGGAGCTGTCCTTAGTAC 
GAGAGGACCGGGATGGACATACCTCTGGTGTACCAGTTGTCGTGC 

20 

Sequence 3352 

step . lOOleOB . cons .ok 

TTCATTACCTAATGCAGGTAAAATTTTCTTAATTGCCTGTGGCATAACTAC 

25 TGATTGCATAGTCTGTCTATAATTCAATCCTAAACTTCTAGCAGCTTCAGT 
TTGTCCTTTATCTACAGCATTAATACCAGCTCTAATAATCTCGGCAATGTA 
GGCTGAAGAATTTATAACAAGCGCAATAGTACCACAAATTAAAGCTGAAAT 
ATCTAACCCCAAAGCTGCAGTAGTCCCAAAAAATACAATAAACACTTGCAC 
AAGCATAGGAGTACCTCTTAAAAATTCTATGTATATACTTGCTATCCATTG 

30 TAATGGTCTAATTTTACTTATTTTTAATAAAGCTATAAAAGAGCCTAAAAT 
AGACCCCAATACCACACCTACTAATGAAATCAAAATAGTATTTTTAATTCC 
TTTAATGAAAAAACTGCCATATTTAGAAATAAAATTTCCATCATCTTGCAT 
ATCTTCGGCTGCTTTTGTCATGTATTGATCAATTAAATTTTTTTCTTTTAC 
ATTATCAATCGTTTGATTTAGCTTATCTAAAAGTACAGGAGAGTTCTTAGG 

35 AACAGCTATACATGTTTGTTTCTTTTCTTCATTAAATTTAATCTTTGAAAA 
TGTTAGTTCTGAATTTTGTTTTAAATATGCCTCACCTACTGGTTTTTCAAC 
AACTACACCTGCTACTTTTCCACTTTTTAAAGATAATATAGCTTCAGGTAA 
TCTATTGAGTGAAGAAATTTTACTATCTTCAATCTCAGTTTGTGCAATTTT 
TTCTTGATCTGTCCCTTTTTGCGCAGCAATTTTTTTGCCTTCAAAATCTTT 

40 ACCTGTTGTCATCATCATTCCAGGATGTGTGAACATTACTTTTCGTGCTTC 
ATTAAGCATCATCCCCCATTCGGCTGTAGGTGCCTTAACACCTAATCCAAG 
GAACGAGAATCCTGACATTTGTAATATCATTGAGCACATCGAACTACTAGC 
AATAATCGCTATGTCAGTAAAGGTTAGTGGCAAAATATGTTTGCGAATGAT 
TGTTAAATCATTCATACCAATTACTTTGGCAAATTTTACATGATCAGCTTC 

45 AATATATTGCATTACACTGGTTCGAATCACGCGACAAAACCACGCCCATCG 
AGTCAATATAAACGCAATAATAATATTTTCTACACCCATGCCAAACAACGT 
AATCAATGCCAATGTCACCACATAGCTTGGAAAAGCTAACATCACATCGCA 
TATACGCATAATTATTGCATCGATATAACCTGGGAAATAACCTGAAATAAA 
CCCAAGTATCGCTCCTATCACAACGGAAATAATCAATGCGACAAATACATA 

50 . TAACAAACTAGGTCTTATGGCGTATATTATCCGTGTTAATACGTCCCGGCC 
TAAATGGTCTGTCCCCAACCAGTGAGACCAACTTATACCAGCAAATTTATT 
TGCTGTATCAATGTGATTCGCTTCATAAAATGTAATTAAAGGAGCTAAAAC 
TCCAAGCACTACGTAGATTGTAATAATAGCTATAGCAACTATGGCACCTCT 
ATCTTGAAATAGACGACGTAACACAATCATTTCATTGCCTCCCTTAAACGC 

55 GGATTTAATATTGCATTAATGATATCTGCAAGTGTATTAAAAAGAATAAAC 
AACACCGCGACAATGAGCACGTATGCTTGAATTACTGGAAAATCATGTTCA 
GTAATCGCTTTAACACTTAATGGTCCTAAACCTGGCCACGCAAATACATTT 
TCAATAACGACTAAACCTCCCATGATCATTGGTATCGACATACAAAAAATA 
GATACGGCTACCTGTAATGCATTACGTAATACATGCAGCAACATTGTCAGA 
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CGAGGTACACCACTAGCTTTTAAATACAAAACATAATCTTCATTTAATTGT 
TCAAGCATAGATCGTCGGACATTTCTAAAATAAATTCCTGCATAGGTAATC 
GTAATAACAAAGACTGGAAGAATATAACTTTCTGGACCTGTGAGTCCTGAA 
GTTGGTAATAAATTGAGTTTTACAGATACATAAATAACTAATATTGAAGCT 
5 ACCCAATATGAAGGCAGAGCTGTTAGTAGAAAAGCTATTGAACGTATAGCA 
CGATCTGTAAATCGACCACTAGTCAAAGCACTCACTATGCCTAGAACAACA 
GATGTTATCATCACCACCACACTGGAAATCAAAGTGAGTTTTAATGTATTC 
ATAAACGCTGGACCCATACGCTCAGATACTGGATCCCCTGTGATATAACTC 
GTTCCAAATTTAAACTGCAAGGCATCAACAAGCCAATTTTTATATTGCGTG 

10 AATATCGGTTGATTTAATCCATATTTTTCTTTAGTTTCAGCTATTAAATCT 
TCTGTTATATTAGGTGTTCCTTGGGCATGTAAAATTGTTACTGCTGGATCT 
TCGTTTGTGAGATAGGTCATCATGAAAGTCATAAAAGTCACTACAATTAAT 
AATGGAATCATGAGCGTAAGACGTTTCATAATAAATTTAACCATGACGAC6 
CTCCTTTATTTATATTGCATTTCATTAAATGGAAGCTCATATTGTGATTGT 

1 5 GTAAAGGATAATTTTTGTAAATCTTTAGGTGCAAC AAC AGTC ATTCTGCCG 
TGAGTAATTGGAATAAAGACACCTTCATCGTCAACTTGTTTTAAAATTTGC 
TGATACGCTTTTGAACGTGCCTTTGCATCTTGAATTTTGAAAGCATCATCA 
ATATCTTTGTACAGTTGTGTCTTATTCTTAATACCTAATGTTGCACTTTCA 
TATCCCGTTTTCGTCTTAAATGCTGCTAATGTACTCTGAGGATCATATAAT 

20 AGCCCCCATGTTTGATTAAACATTAAATCGTAATCACCAGAACTACGACGT 
TCAGCTACTTTATCTGACGTTTCTCCGTTTATATCTAGTTGTACACCTAAC 
TTCTTAAATTCTGCTTCTAAGAATTCAGCTTGTTCTTTTTGACTGGAAGAC 
CCTTTGTCATAATACAACTTAAGATTCAAATCTTTGCCCTCTTTTTGACGA 
ACTTGTCGATCTTTTGAAAGCACCCATCCAGCCTTGTCTAATAACGCTTGC 

25 GCTTTTTTCTTATCATATGTTCTTGTTGGTAAATTAAAGTTTATATCTGTC 
ACATTTTTAGCAAATAGTTGTGTGGCTGGCTTTTCTTGTTTGTCTAAAATA 
TCTTGAGCTATTTTATCTCTATTTACCATGTGACCTAATGCTTGTCTGACT 
GCTTTATCACTGACTGCACTATCTTTCTTACCAGAATTAACAACAAGCATT 
TTTGTATTCATAGCCTGGCTACGTTTTACTTGGTAGCTTCCGGTTTCTTTT 

30 AATTGTTTTAAACTATCTTTATCTAAGCTGTCTGTACCTCTATCATCTGTA 
TAAGCAAAGTTGGTTTCTCCTTTTTTCATTGATAAAAATGTTGTTTCTCCT 
GCAGGTTTAACTTTTGCTTCTACTTTATTTAATTTTGCCTTTTCTCCCCAA 
TATTGATTATTTTTATTAAACTCTGCAGATATATCTTTTTTGTGTC 

35 

Sequence 3353 

step . 1001e06 .cons . ok 

TTAAAGTAGGTCAACCAAGAGAAGAAGGAACTCAAGTAGGTCCTATTATTA 
GTAAAAAACAATTTGATCAAGTTCAGGATTACATTGATAAAGGTATCAATG 

40 AAGGAGCAGAGTTATTCTACGGTGGTCCAGGTAAACCAGAAGGATTAGATA 
AAGGATATTTTGCGCGTCCAACTATATTTATTAATGTAGATAATCACATGA 
CGATTGCACAAGAAGAAATTTTCGGACCTGTCATGTCTGTAATCACTTATA 
ATAATCTTGACGAAGCTATTGAAATTGCTAATGATACTAAATATGGTCTTG 
CTGGCTATGTTATTGGTAAAGATAAAGATACATTGAGACATGTTGCTCGTT 

45 CAATCGAAGCTGGTACAATTGAAATTAATGAAGCTGGTAGAAAACCAGACT 
TACCATTCGGTGGTTATAAAGAATCTGGCTTAGGTCGTGAATGGGGCGATT 
ATGGTATCGAAGAATTCTTAGAAGTAAAATCAATAGCTGGATATTTTAAAT 
AATTAGTAAATTAAATTAAATGTTGCAATTAAACAAAGCACGGAACTAATT 
TTGTTCTGTGTTTTGTTTTTTATGTTAAAATTCCTAAAATCATTTTGTAAT 

50 TAAAATGAAAATTTGCTATAGTTAGTCTAGTAATTAGAAGGTTGAGAAATA 
TACAAACGTATAAATATAGCTATAATGTGACTTTTTATTATTGTCAGTCAC 
ATAAATATCGAGATAAGAATCTATTATTTGGTTAAGCTAACTATACGTTTA 
TTATTTTTGAACTTCGTAAAATAAATAATTAATGAAATGGGTGCAAGTCTA 
TGCCTCGTAGAGAACGAACATCTCCGCAGTATGAATCTTTCCACGAATTAT 

55 ATAAAAACAATACTACTAAAGAGCTCACTCAAAAAGCCAAATCTTTAAAAT 
TAACTAATTACAGTAAATTAAATAAAAAAGAATTGGTACTTGCCATTATGG 
AAGCACAAATGGAAAAAGATGGTAATTACTATATGGAAGGAATATTAGATG 
ATATTCAACAAGATGGATATGGTTTCTTAAGAACCGTTAACTATTCTAAAG 
GTGAGAAGGATATTTATATTTCTGCAAGCCAAATTCGACGTTTTGAAATAA 
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AACGTGGTGATAAAGTAACGGGTAAAGTTCGTAAACCAAAAGATAATGAAA 
AATATTATGGTCTACTTCAAGTTGATTTTGTAAACGACCATAATGCAGAAG 
AAGTCAAAAAACGTCCTTCACTTCCAAGCTTTAACACCTCTTTATCCGGAA 
GAAAGAATCCTATTAGAAACGCAATCTACAAATTATTCCACTCGTATTATG 
5 GATTTAGTCACACCAATAGGTCTTGGTCAACGTGGTCTTATAGTTGCACCA 
CCTAAAGCTGGTAAGACAAGTTTATTAAAAGAAATCGCTAACGCAATAGCG 
AGTAATAAACCGGAAGCGAATTTGTTTATATTACTAGTAGACCTACTCAGG 
AATTAAGAGAGCAAGGTGTACGTGTGAAACTGTCGGCTGTTAAGGATATTG 
TTGATGGTAAAGATATCGTACTTGTAGATGATTCGATTGTTCGAGGTACAA 

1 0 CGATTAAACGCATAGTTAAAATGCITAAGGATTCAGGAGCTAACCGCATTC 
ACGTAAGAATTGCTTCTCCCGAATTCATGTTCCCTAGTTTTTATGGTATTG 
ACGTATCTACAACAGCTGAACTCATCTCAGCAAGTAAGTCTCCTGAGGAAA 
TTAAAAATCATATTGGTGCAGATTCTCTTGCTTATTTAAGCGTTGATGGCT 
TAATCGAGTCTATAGGACTTGATTATGATGCGCCATATCATGGCTTGTGTG 

1 5 TAGAAAGTTTTAC AGGTGATTATCCAGCAGGACTTTACGATTATGAGAAAA 
ATTATAAAAAGCATTTAAGTGAACGTCAAAAATCATATATAGCTAATAATA 
AACATTATTTTGATAGTGAGGGAAATTTACATGTCTAAAGCATATGAGGAA 
TCTGGTGTAAATATTCAAGCAGGATATGAAGCAGTCGAAAGAATAACAAGT 
CATGTTGAACGTACATTGCGCAAAGAAGTATTAGGTGGTTTAGGTGGATTT 

20 GGTGCAACATTTGATTTGTCTCAATTAAAAATGAAAGCGCCAGTTCTGGTA 
TCAGGTACTGACGGTGTGGGTACAAAGTTAAAATTAGCAATTGACTATGGA 
AAGCATGACACAATTGGTATTGATGCTGTCGCAATGTGTGTAAATGATATT 
TTAACAACAGGTGCTGAACCTTTATACTTTTTAGACTATATTGCCACGAAT 
AAAGTAGTGCCAAGTACTATAGAGCAAATCGTTAAAGGTATAAGTGACGGT 

25 TGCGAACAAACCAATACGGCACTTATAGGCGGTGAAACTGCTGAAATGGGA 
GAAATGTATCATGAAGGTGAATATGATATTGCTGGTTTTGCAGTAGGAGCG 
GTAGAGAAAGAGGACTATATTGATGGTTCAAATGTTGAAGAAGGACAAGCA 
ATTATTGGTTTAGCTTCAAGTGGTATTCATTCAAATGGCTATAGTCTAGTT 
AGAAAAATGATAAAAGAATCAGGAGTTCAATTACATGATCAATTTAATGGT 

30 CAAACCTTTTTAGAAACCTTCCTTGCACCAACAAAATTGTATGTAAAGCCT 
ATTCTTGAATTAAAGAAACATATTGATATCAAAGCGATGAGCCATATTACT 
GGTGGAGGTTTCTATGAAAATATTCCGCGTGCCCTTCCTAAAGGTTTATCA 
GCAAAAATAGATACACAATCATTCCCAACGTTGGAAGTCTTTAATTGGCTT 
CAAAAACAGGGCAACATTTCAACGAATGAAATGTATAACATATTTAATATG 

35 GGTATTGGATATACAATTATTGTTGACAAAAAAGATGTTCAAACAACATTA 
ACAACGTTACGTGCAATGGATACAACTGCATATGAAATTGGTGAGATTATA 
AAAGATGATGATACACCTATTCATTTATTGGAGGTAGAATAGTGACCAATA 
TAGCGATTTTTGCCTCAGGATCAGGTAGTAACTTTGAAAATATTGTAAAAC 
ATATCCAAACAGGGCAATTATCTGGTATCAATGTAACGGCACTGTATACAG 

40 ATAATGAAGGGGTACCCTGTATTGATAGAGCTAAAAATTTAAATATTCCAA 
TTCATATTAACAAGCCAAAAGACTTTTCATCCAAATCTTTATATGAACAGC 
ATCTACTTAAATTATTATCCAGTGAGGAAGTTCAGTGGATTGTATTAGCTG 
GTTATATGAGACTAGTTGGCCAAGACTTATTACAAGCTTATGAAGGACGGA 
TATTAAATATACATCCCTCATTATTGCCTAAATTCAAAGGTTTAGATGCCA 

45 TAGGTCAAGCGTTAGAAAGTGGAGATACTGTAACTGGATCAACTGTCCATT 
ATGTAGATAGTGGGATGGATACGGGAGAAATTATTGAACAACAGCAATGTG 
ATATAAAACCGGACGATACTAAAGAACAATTAGAAGATAGAGTAAAACATT 
TAGAATATGAACTTTATCCAAGAGTTATAGCTAAAATCATTAAATAGAGGA 
GTATCTTATAGAATGAAAA 

50 

Sequence 3354 

step . 1001el2 . cons . ok 

TAGTGAATGAAGACCCTGAACGTAAAATCATTATCGTTTCAGCTCCAGGCA 
5 5 AAAGGC ATAATC ACGAC ATTAAAACTACTG ATTTATTAATTCGTCTCTATG 
AAAAAGTACTTAATAAATTAAATTATGAAAGTAAAAAACAAGAAATTATCC 
AAAGATATGCTGATATAGTAGAAGAATTAGGTATAGGAAATGACATTTTAA 
TAACAATTAATGACACTTTAGAGGAATACATTAAACATCTTTCTGACAAAC 
CTAACCGTTTATATGATGCTTTATTATCTTGTGGCGAAAATTTTAATGCTC 
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AATTAATAGCCCAGTATAATAATAGTCAAGGTATTCCTACTCGTTATATTT 
CTCCTAAAGAAGCTGGATTAACTGTAACTGATTTACCACAGCAAGCTCAAA 
TTTTAGATTCCGCATATAATGAAATATACAAATTGCGTGATTATGATGAAA 
AGCTAATTATTCCTGGTTTTTTCGGAGTTTCAAAGCAAAATTATATCGTTA 

5 CGTTTCCACGCGGTGGTTCTGACATAACTGGTGCTATCATAGCACGTGGCG 
TCCGAGCCTCACTTTATGAGAACTTCACTGATGTATCAGGAATATATAAAG 
CTAATCCGAATATCATAAATAATCCTGAACTCATAGAGGAAATAACTTATA 
GAGAAATGCGAGAGCTATCTTATGCAGGATTTGGAGTTTTTCACGATGAAG 
CTCTACAACCTTTATACAAAGATCGAATTCCCGTAGTTATCAAAAATACTA 

1 0 ATCGTCCAAATG ATAAAGGGACCTAC ATTTTACATGACCGTGAAATCGATT 
CTAAAAATGTCATTAGTGGAATTAGTTGTGATAAAGGCTTTACTGTGATTA 
ATATTAAAAAATATTTAATGAATAGATTAGTTGGATTTACACGAAAGATTC 
TTGGCGTTTTAGAAGAATTTAATATATCATTTGACCACATGCCTTCTGGTA 
TTGATAACATAAGTATTATCATGCGTACAAATCAAATTCAAGGTAAAGAAA 

1 5 GTC AAGTTCTTAATGCCATACGCAAACGTTGTGAAGTTGATGAATTAAGTA 
TCGACCATGATTTAGCAGTACTAATGATTGTTGGTGAAGGTATGAATCAAG 
TTGTTGGTACAGCTAGTAAAATTACTCACGCCCTTTCAGAATCAAACATTA 
ATTTAATAATGATTAACCAAGGTGCTTCTGAAATTTCAATGATGTTTGGAA 
TTCATGAAGCAGATGCTGAAAAAGCAGTATTATCTACGTACGAATTTTGTT 

20 ACAACGGTGTTTGTTTAAAAAATTTGTGTAAATAAAATAAAAAAGATTTCT 
ACATCACTTTCAATACATAAAGTAAATTTATTAACTCTGTATCAATGATGT 
AGAAATCTTTTTAAGATATCTTTTTAGTGGAACTATTTACTATAGCTTTTT 
TAATACGCTGTTCACCTACTACCAATTTTAAAATATAGAATTGATAATAAC 
TAAGAGAATCAATAACAGCTTTATAATTACTAAACTGATGATATTTTTGTT 

25 CACTCATTAAATCATACATAAGAACTATTTCTGGCTTAATATAATCTACGT 
TCCAACTTAAAGAATGAAAATATATATTATTTTTTGGTAAACGTATATGAT 
GATCTAACCTAAATAGCCATTCATCATTTACAACATCATAAACAAATATTG 
TCATGACTTCTGTATCGTTTTTAAAGACTTTAACATGTTCAACATCTGATA 
AAGATAAAGATTCTCTTAATCTTTCATATAACTGATTATCATAATAGTGAA 

30 TCGTAAATTCTTCAGGAATCACCCTGATAATTTCCAATAACTTTTTGCGCT 
CAACACAAATATCTATTTTCGATGGTAATATAAAATCATCATTCAGAAATA 
GATGCATCGCTACCTCGCCGTGAAATTTGAATAAATGAGATGATGTTTTCC 
TTAAAATTTCGACTATTTTATTCACTGGTTTTTCTTCTTTAACAAACATGT 
TACCACCCCTATAGCAAGCGCTTTCACAATCAAATTATACATTTTCTTTTT 

35 TCACTTATCAAACAAATAAATGTGAACATTTAGTAAAATTTATTTTATTTT 
CAATAAAATTATTTTCATAAAAAGTTACTATATCACTATTCGCTACGCATA 
TGATAGGGCTCTCATAAAAATTCACGAAAGTTTAATTGATATGTAAATATT 
TCGTCACTACTATTTGATAAAACACATTTCCCCTTTCATTTTCATTGAACT 
CAAAATTTAAGAAGTTTATAATTTATCTTGCAAATAGAATTTAATATTAAA 

40 AGATTTGGGTGAGTAGTAATGTCTTTAACGATTATCTTATCTATAATAATT 
ATTATTTTGATTATGGCTATGGTTCTTAATCAAAAATTTATGAAAGATAGG 
GTTGAAACAGAAGAATATGCTAGAAATCAATTAATCTCTAAAAATTCAATT 
TTAAGTGAAGAAAATTTATCATTGAAAAACCAAATGTTAAGTACAAACAAT 
GACGTCGGTCAACACGCTTTTAAAAACGCCAAGCGTGAATTAAGAAAAATA 

45 TTAAATAGATTTAAAGAAGAGGGTCGTTTACGATCATATACAATTGTTCCT 
ACGAGTAATTTGGCTGTTAAACATCCCCTTTTCGAATATGCACX3TTCATTC 
GATTTTATTATCATTACTGATGTTGGTTTGATAAATGTGGAT6TTAAAAAT 
TGGAACCAAAAAACGTTTTATCATTTTGATGTGCCAGATCAACATCTTGAA 
GAAGGACAACCACAATATAATACCGAAAAAGTTGTCGGTCATTATATTAGC 

50 AATCGATATCATAGTCAGTTTAAAACAACACGTTCTGGTGTCTATACTTTT 
ATTGAGATTTTACAGGATAATCGTGTAATATATGAATTTTATGACCACGAT 
CCATACGATAAAGCCGCAAACAATGCAAAAGCATTAAAAGATAAAATTGAA 
AATCATTATAATTTTAAAATTCAAAGTATTGGCGTCATATATTTTAGTGAT 
GGTAGCGTTAATATTATTGAAGGATCCGACGAGAGTGATAAATACGTCGAC 

55 ACCGTATCTACACCGATATCACTTGAAAAAGTAATTGAGGAAGCTATCGAT 
TTATCTAAGCACCCCCTTACTGATAAACAAATCGAAGAAATTTCTGAAAAC 
TTTAAACAACATATGAATAATTAAAATTCAAAAAGAGTTTGGGGCATGAAG 
TTATTTGACTAATAAAAAGACAATTTCTATTTATAAAATGTAGAAATTGTC 
TTTTTATTATAATTATTTAAATTGTTGAGCAACTACTTTTTTTATATTCAT 
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TAGCATTAATACAAATGCAATTTCTCTTTTGACTTTAATTATGTTTCGAAT 
CGACATTCTGATGAAACTTAAAATAACTTTCGTAAATCCATCAACAAGTTC 

CACATCAAATTTCTTTAACTGCAAACGTTTCTTCTAAAATAAGAAATCTTA 
TCCCAATTTGAATTTTAAAATATCTTAAATTGTAATTTTACATTCCAACTT 
5 ATTAAATTATCTATTATCAACAATGTTGTCATATATATTTTTAATTTTAAC 
GTTTTAATACGATTCAGTAGTCATAGCACACATTTATATGACAACACTTAT 
AGCAATTATTTCATGAATAAATAAACTTAATTATAAACTGTATCAGTTTAC 
ATTAGAAATCTTTTTAAAACATAATGTTTTTCACTAGACTAAGAAGTAGAT 
ATCCTAATTTCTACTTTATTGAAACACTAGTCAGAAACGTAGTTGAAATCT 

1 0 ACTAAATATC ACTCTTTTTTTATTCTAATCTCTTTAGCTAGAATAAGAAAA 
CGCTGCTAAACAAAATGGTCCAACTGAGTTTCTACTTATTGATATAACAAA 
TGAATTTTATATTTCCACTCAATATCAGACTCTAAATTATCGCTACTAGCT 
CCAAATATTTAATTTTCTCTTTTTTGCCTCTTCTTGAGCGTTTACGAAGGT 
ATCCCTATATTTTCCGTTTGGAGAAAAATACTTTTCTCTGGCTAGACCTTT 

15 TTTCACCAGCGATT 



Sequence 3355 

step. 1001 f 07 .cons .ok 

20 ACTTGAAAAATCAAATGCAGATCGTGACATTTTCTATTTAAAATCTATTGA 
TAACAATATTCGTGAATATCATATAGCAGAATAATCTAAACTCAATTCTTT 
TGCTTTTTTAATTTTTTAAAATGAAAGAAACGAATTTACTATTCATACTTG 
CCCCCTAAAAACAACGTCATATAGCTGTTTTTAGGGGGTTTTTAATCCCAA 
GATAGTGCATTTCATCTATTATAGACCTCTACTATTTATATACTGTTTACC 

25 TACTATTCATAGATAAAAACGAATCCTATCCATAGATCAACATTGACAAAT 
AACTAAAGCAATTCTATACAATAACGTATCTCAAGGTTTCGTTGTTTTTTT 
AGGTCTGTATAAATGTGTCATATCTATTTCTTCCAATTTTAATAATTTTAT 
TTTGTTGTCTAAAGTACCTCCGTATCCTGTTAAATTTCCATCTTTACCAAC 
CACTCGATGACATGGTATAAGTATAGAAATAGGATTACTCCCAACAGCACC 

30 TCCTACAGCCTGAGCAGACATTGCTGGCTTATTCATTGCTCGTCCTACTGT 
TTTAGCAATCTCTCCATATGTTTTAAGTTCTCTATAACGAATTGTCTGAAG. 
TTCATTCCATACCTTCTGTTGAAAATCAGTGCCTTTTGGTGCTAACGGTAC 
TTTTATCTCAGGATAATTCCCTTTAAAATACTCTCTTAACCACGCTTTAGT 
ATCTTTAAAAACATCTAACGTATCCTTTGTTTTAATTCCTTTCAAATGTTG 

35 ATTAGGGTATAAGACATGTGACAACGAGATGCCATCGTTTGTAATTAATTG 
CAATAAGCCAACGGGCGAGTCATACATTGATTGATACATATTTCTCCAACT 
CCTTAAATGTGTATATCATTCTAGTCATATCCTTATAAACTAACGCATTCT 
ATTCTGAACATAATTTTATAATATAGATAATTGCTATATTTATGAAATTGG 
ATAAACATTCGAATAGAATGTGATACACCATTAACGATAACTAAAATTGAA 

40 AAATGTCATCATTTATATATTAAGTCAATTATCGACAGGTCATACTTTAAG 
AAAGTCTGGAACTTTGATTTACGTCCCAGACTGGTCAAGATATAGAATTTA 
AAGTCCAACTCACCCTTTTGATTAGATGTCATTATATTTATTGAAAATTAT 
AAAGTTCATACCCTTTATTTTTTGCATATTCTAATGCGAGAATTTCTAAAA 
CATTAATGAACGCAGGTTCGTCTTTTACTTTTATATATTGAAAAGTGAGGC 

45 CGTATTTATCTGCAGTAATGTCTTTTAAATGATTTGGATTTAAGTCCTTAT 
TTTGAGCATAAGCTTTTCTAAATTTAGTCGTACCTGTTTCTGAACGTGAAT 
TGTATTGTCCTAATTTTGTATACAAATCACTATTTATAGGTTGATGTGTGG 
ATGAATATCCTCCAGTTTGACCTACAAACACTGGTTTTTTCTCATCGAAAA 
TCACATAGACACCAAACTTATCCTTAACTTTTCCCCTATCCTCTGGTGTAA 

50 ACGCCAAAGCCTTTTTATGATCGTACTTAAATAATTTTTTTAGTTTCTTCT 
GATTATCTTCTAGTAACTCATTTAATTTTTTGCTACCCTTCATACATTTAT 
TCCTCCCCATTAGCAATAAACTTATATACATACATTTTACATGTAGTATAT 
CACTTTTTAAAACTTTTAATCATTTTCAGTTTTAATAATTTTCACTATTAC 
CTCAGGTACTAAAAAACCTAAGATTGAAAACACAAAACCATTTAAAACGTA 

55 GCCTAAAATACTCCATGTGTGATTATAGCTCATCTGTAATTGTGTTCTCGT 
TTGAGCTATACGGTCACGTTCAAATCCATGCACGAGTCCCGCAACTACTGC 
TGCAATAATCCCTGATGCATGGAATAATTCAGCAATCAAATAAGTGACAAA 
CGGCGTCAGCAATTGTATAAATGTAAACATGTTAATGTTCTCAATGCCACG 
TCTCATTAATGTTAAACGAAATCTCACTAAAGCCATGCCTATAATTAGACC 
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AACAATAGCACCTCCAATAGAGGCAATCAAAAATTGTTGAATCGCATCAAA 
AATGGAAAAAGTTCCAGTAATCAATACACCTACGGCAATTTTATVATGAAAT 

GATACCTGCAGCATCATTAAGTAAAGATTCACCTTCAAGAATGGTCATTGA 
ACCTTTTGGTAAAACCTTACCTTTCGTGATTGCTTGTACCGCAACAGCGTC 
5 TGTAGGACAAAGTATAGCCGCAATTGCGAAAGCAGCACCTATAGGTAATTC 
AGGCCAAATCCAATGAATAAACAATCCTACGCCTATAACGGTAGTTATGAC 
AAGACCTAGAGCCATCATCATAACAGGTTTGATATACTTCCTTAAATGTAC 
TCTTGAAACATTAACACCCTCAACGAATAGTAGTGGTGCAATCAATGTAAC 
CATAAATAATTCAGAATCAAAATTGAATTCAACTGGAATGGGAGTGAGGTA 

1 0 TAAAATCATCCC AAGGAAAATTTGGATAAATGC AAGAGGTACTTTAGGTAT 
GAAAGTATGTACAAAGGAGCTCACAATAACTAGTGCGACAAATATGAGTAG 
AGTTTCAAATATCTCCACACTTTCACCTCTTTATTAATTTTTAATTGCAAA 
TTTAGTCATCAAAACAATAAAAAGTTGCAAGGTTATCTACTGCAACACTGA 
AGAAGTGAAATAATAACAATAGAGTCATTCCTGTAAGGACATAACTATTAA 

1 5 CACTCAATCATGCC ACAAATATACTATTCAAAACTC AATTATTTTCTTTAT 
CAGTGAAATCAAACATGAACATTTAAATTAACTTATGAAGCTCGCTTCTTC 
GAACGAGAAAATGCTTATGTGTAATTTCATTGAAAAAATTAGTGACAACTG 
TCATTCACAATTATTCTTGTTTACAATTTTATCATTTATTAACCAGATTCT 
TAAATAAATAGCTTTTATATCATTAATATCTTCACCTCAGATATACAAAAA 

20 ACTACTAGCATATAAAAGATATATCTTTTATATGCTAGTAGTTTAAAAGCT 
TAAACAATCAATCGTCCATTTTATTAATATATTCTTTTTCTTTATAGTATA 
ATTTTTCTCGTCTTTCTTCTCGTAGTTTTGCACGTTTTTTTAATTCTTCTG 
GCGTATTAAACTCTTTTTGGCTTAATAAGATAGTAGATTGTCTAGATGCAT 
CGTCTTTCCTGTACATATATGCACCATTACGGTAGGCCGCTCTACTTATCA 

25 AATGCATTCCAACCGGAGATGTAAGATTGATAAAAACTAGTGATAATAATA 
ATCTGACACTGAAAAAACCTGAAACTATGCACTCATACATCATAACAAATT 
TGAATCTTTTATAGATGATTTCAACCTTACTGTAAATCAAGAGATGAAATG 
GGCAAT 

30 

Sequence 33 56 

step. lOOlgOl .cons .ok 

TGGGCTGTTTCCCTTTCGAACACGGACCTTATCACCCATGTTCTGACTCCC 
AAGTTAAATTAATTGGCATTCGGAGTTTGTCTGAATTCGGTAACCCGAGAG 

35 GGGCCCCTCGTCCAAACAGTGCTCTACCTCCAATAATCATCACTTGAGGCT 
AGCCCTAAAGCTATTTCGGAGAGAACCAGCTATCTCCAAGTTCGATTGGAA 
TTTCTCCGCTACCCTCAGTTCATCCGCTCACTTTTCAACGTAAGTCGGTTC 
GGTCCTCCATTCAGTGTTACCTGAACTTCAACCTGACCAAGGGTAGATCAC 
CTGGTTTCGGGTCTACGACCAAATACTCAACGCCCTATTCAGACTCGCTTT 

40 CGCTGCGGCTCCACATTTGCTGCTTAACCTTGCATCAGATCGTAACTCGCC 
GGTTCATTCTACAAAAGGCACGCCATCACCCATTAACGGGCTCTGACTACT 
TGTAAGCACACGGTTTCAAGTTCTCTTTCACTCCCCTTCCGGGGTACTTTT 
CACCTTTCCCTCACGGTACTGGTTCACTATCGGTCACTAGAGAGTATTTAG 
CCTTAGGAGATGGTCCTCCCAGATTCCQACGGAATTTCACGTGCTCCGTCG 

45 TACTCAGGATCCACTCAAGAGAGAATATGTTTTCGACTACAGGATTATTAC 
CTTCTTTGATTCATCTTTCCAGATGATTCGTCTAACATGTTCTTTTGTAAC 
TCCGTATAGAGTGTCCTACAACCCCAACAAGCAAGCTTGTTGGTTTGGGCT 
ATTCCCGTTTCGCTCGCCGCTACTCAGGGAATCGATTTTTCTTTCTCTTCC 
TCCGGGTACTAAGATGTTTCAGTTCTCCGGGTCTGCCTTCTGACATGCTAT 

50 GAATTCACATGTCGATAACATGACATAACTCATGCTGGGTTCCCCCATTCG 
GAAATCTCTGGATCAACGCTTACTTACAGCTACCCAAAGCATATGGTCGTT 
AGTAA03TCCTTCATCGGCTTCTAGTGCCAAG6CATCCACCGTGC6CCCTT 
AATAACTTAATCTATGTTTCCACCATATTTTGAATTGTTATTCAAAATAAA 
TAGCTAAAACTAGTTATTAATCTTGTGAGTGTTCTTTCGAACACTAGCGAT 

55 TATTTATGAATTCAAGCTTATTTAAAACTCTATTCACTCGGTTTTGCTTGG 
TAAAATCTTACTTACTTATCTAGTTTTCAATGTACAAAGAATGTTAATAAA 
CATTCAAAACTGAATACAATATGTCACGTTATCCCTCATCTTCGTAGAAGA 
TGTTCCGAATATATCCTTAGAAAGGAGGTGATCCAGCCGCACCTTCCGATA 
CGGCTACCTTGTTACGACTTCACCCCAATCATTTGTCCCACCTTCGACGGC 



860 



wo 01/34809 



PCT/USOO/30782 



TAGCTCCAAATGGTTACTCCACCGGCTTCGGGTGTTACAAACTCTCGTGGT 
GTGACGGGCGGTGTGTACAAGACCCGGGAACGTATTCACCGTAGCATGCTG 

ATCTACGATTACTAGCGATTCCAGCTTCATATAGTCGAGTTGCAGACTACA 
ATCCGAACTGAGAACAACTTTATGGGATTTGCTTGACCTCGCGGTTTCGCT 
5 GCCCTTTGTATTGTCCATTGTAGCACGTGTGTAGCCCAAATCATAAGGGGC 
ATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCA 
ACTTAGAGTGCCCAACTTAATGATGGCAACTAAGCTTAAGGGTTGCGCTCG 
TTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAACCATGCA 
CCACCTGTTACTCTGTCCCCCGAAGGGGAAAACTCTATCTCTAGAGGGGTC 

1 0 AGAGGATGTC AAGATTTGGTAAGGTTCTTCGCGTTGCTTCGAATTAAACCA 
CATGCTCCACCGCTTGTGCGGGTCCCCGTCAATTCCTTTGAGTTTCAACCT 
TGCGGTCGTACTCCCCAGGCGGAGTGCTTAATGCGTTAGCTGCAGCACTAA 
GGGGCGGAAACCCCCTAACACTTAGCACTCATCGTTTACGGCGTGGACTAC 
CAGGGTATCTAATCCTGTTTGATCCCCACGCTTTCGCACATCAGCGTCAGT 

1 5 TAG AGACC AGAAAGTCGCCTTCGCCACTGGTGTTCCTCCATATCTCTGCGC 
ATTTCACCGCTACACATGGAATTCCACTTTCCTCTTCTGCACTCAAGTTTT 
CCAGTTTCCAATGACCCTCCACGGTTGAGCCGTGGGCTTTCACATCAGACT 
TAAAAAACCGCCTACGCGCGCTTTACGCCCAATAATTCCGGATAACGCTTG 
CCACCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGTGGCTTTCTG 

20 ATTAGGTACCGTCAAGACGTGCATAGTTACTTACACATTTGTTCTTCCCTA 
ATAACAGAGTTTTACGATCCGAAGACCTTCATCACTCACGCGGCGTTGCTC 
CGTCAGGCTTTCGCCCATTGCGGAAGATTCCCTACTGCTGCCTCCCGTAGG 
AGTCTGGACCGTGTCTCAGTTCCAGTGTGGCCGATCACCCTCTCAGGTCGG 
CTACGCATCGTCGCCTTGGTAAGCCGTTACCTTACCAACTAGCTAATGCGG 

25 CGCGGATCCATCTATAAGTGACAGCAAAACCGTCTTTCACTATTGAACCAT 
GCGGTTCAATATATTATCCGGTATTAGCTCCGGTTTCCCGAAGTTATCCCA 
GTCTTATAGGTAGGTTATCCACGTGTTACTCACCCGTCCGCCGCTAACGTC 
AGAGGAGCAAGCTCCTCGTCTGTTCGCTCGACTTGCATGTATTAGGCACGC 
CGCCAGCGTTCATCCTGAGCCAGGATCAAACTCTCCATAAAAAATTATGAT 

30 GTTTGATTAGCTCATAAATACTAAATAGTTTGTAACGTTTCGTTACTGTTT 
ATTGGAATTAACGTTGACATATTGTCATTCAGTTTTCAATGTTCATTTAGT 
AAAATCAATCTTTTTTATTGTACTATATTGTTTTAAGAAAAGTCAATAACT 
AATTGATATTACTTAATTAATTTTAAAAGCTTTTAACGATAAGTCTTTTTC 
TATATTCTTTATAATCATCCATTAAAGTTAATTCATAACCTACCTCATCAA 

35 ATACTTCAGGTTCAATCTTTTTGATTTTATATAATCCCCCAAAAATAAAAT 
ATTCTGG 
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TATAACTTCCGTGAAACTTTAGCAGTTTTGCATCATCTTGTGTAATGGATC 
CAGTGATTGGATTAGCTAAACTTTGTTCAATAGTTCCACGCAAAAAGTCAC 
TATTTGCTTTTAAAAATTCCATTTCATC7VAGATTTTTATCTAATTCTTCCG 
AAATATGATTATTTGTATTAACCATGTATCCCCCGTCCCTTCTATTAATAC 

45 ACGTCTCTTTGATATCTTTTATCTCTTTTCATTTGTTTTAAGTATTC 

GCATCTGTTTCAGATAGGTTTTGCTCTTTGATTAACACATTTTTAATCGCT 
TGATGAACATCCTTTGCCATTTTACTTTCATCACCACATACATAAATAGTA 
GCGCCATTTTCAATCCATCGATTAAATTGTTCACTATTTTCTGCAATTTTA 
TGTTGCACATACACTTTTTTATCAGTATCTCTAGAAAAAGCAACATCTAAT 

50 TTTGATAAAGTTCCATCTTCAAGCCATTCTTGCCATTCCGTTTGATACAGA 
AAATCTGTAGTGAAGTGTTGATCTCCAAAGAATAACCATGTATTTCCTTCA 
AAACCTAGTTCCTCTCGTTCTTGCATATAGGATCTAAACGGTGCAACACCT 
GTCCCAGGACCTATCATAATCACAGGTGTTGATTCATCTTGCGGAAACTTA 
AAATTCGGATTTCGTTTTAAATAGATAGGAATTGTATCGCCCTCTTGTATT 

55 CTCTCTGCAAATTGTACTGAACAAACACCTGAACGTTCCCGACCGTGTGCT 
TGATATCTAACTGCTCCAACAGTAATGTGAACTTCATCTGGTGTTGCTTTA 
TAACTACTAGATATTGAGTACTCTCTAGGTGGTAACTTTCTTAATAATTGA 
TGTAAATTTTCAGGTTGTAGTTCTGTCGTTGCGAAGTCATTTAATAAGTCA 
ATCAAATCCCTTCCCTCAACGTAGTTTTGAATCCATTCTTTATCTTGAATT 
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TTTTCAGAAAGCTCTTCATTATCAAAAAATATCGCAGCATTTTCTATTAAC 
GGTTTTGTTAATTTAGTAATTTCAAAATGCGATGTTAATGCCTCTTCAAGA 
TTTAAAGTATCTCCATCCTCGTTAATTAAAACTTGTGTTTCTGGACTCCAA 
CCTAATGTGCTAATAAGTAAGTCTACGATAGCTGGATCATTTTGAGGCAAG 
5 ACAACTACACAATCTCCCGGTTCATATTCCTCACCAAAATTATCAAGTAAT 
AACTCAATATGTCTCGTCTCTTTGTCTGAACCTCTACCATTCAAATTAATA 
TTGGTCAATACTTCAGCATCGTAAGGATTAGACTTACTATATTTCTTTTCT 
TTTGCTGATTTTATAGATTCGCTAACAACTTGTTCACTTTCAGTACCAGCC 
GGTGTTGAATCAATCGTATTAATAACATTGGCCATCCACTTTTCAGCGTCT 

1 0 TCTTCATAATCAACATCGCAATCAGTACGATGATAAAGTCTTTCTGCCCCT 
AATTCAGCCAAACGGTTATCAAAATCTTTACCAGTTTGACAGAAGAATTCA 
TAGGTTTGATCTCCTAATGCTAATACTGAAAATCTCACCCCATCCAATTTT 
GGCGCTTTACGTCCGTGGATGTATTCATGTAATTCAACAGCGTTATCTGGT 
GGATCTCC.TTCGCCATGTGTAGATGTGATAATAAATAAATCTTCTACTTTC 

1 5 TTCAAATTCTTAGGTTT AAAGTCATCCATTGATTTGAGCGTAACGTCATTT 
CCAATATCAGATAAACGTTGTTCAAATATTTCTGCAAGTCCTTGCGCATTA 
CCAGACTCAGATCCATATAAAATTGTTATCGATCTAGCTTCCGGCTCAACT 
GAAGGTTCCTTTTCATGTAGCATCGCTTCTGTATTGTCATCTGAGTTATGT 
TGTTCTACAGAATCTGTAGATGTATTTGATTGTTGATTTGCCATCAGATAG 

20 CCACTTAACCATACTTGTTGGTTAGATGATAAAGTGTGAAGCAATTCATTG 
ATTTGTTTAGCTTGCCCTTCCGTAAAAGGACTATTGGTAACACTTAAGTTC 
AATCTTATCCACCTCTTATACTTTGTTATTCCGATATTTGTCATTATTTAT 
ATATGTCTATTTTTGAAAGTTATTAATTATATCTTCAATTCTAGTATGTAC 
CATTTGTTATTTTCAGTAAATGTAATACTCAATTGATAACTAAAATCGAAA 

25 AGACAATTCATAATATCAAAGGGCAATACAAACTTTTATTACTCAAAATTA 
TCGTTACTAAATAAAGCTGTATCAGCTTTTAACAAACAACTCTTTATTTCT 
TATCGGAATACTATGTTTTGAAATTAAAATTTTTTGACTATCTCTTATGGT 
TTATCAGCTACATGTAATCCACATTCAGTCTTACTAGAATTGGACCAACGA 
CCAGCACGTGAATCATTAGAATCAAATACGGGTGATGTACATGGAATACAA 

30 CCAATACTTGGATAATTTTGATCATGTAATTCATTATATGGTAAATCCTTA 
TCACGTATATAAGACCATACTTCTTCTTCTGTCCAATAGATTAAGGGACAC 
ACTTTAATTGACTT7WVTCTTTCATCTTTGTTAATGAAATTTGTATGTGCT 
CGTGTTGGTGATTGTGCTCGTCTAAGACCTGATATCCAAGCTACAGCACCA 
GATAATACGTCTTCTAGTGGTTTAATCTTGCGTATGTAGCAACATTGGTTA 

35 GGATCATTCTTCCATAAAGCAGGATTATATTTCTCACCTTGTTCCTCTAAC 
GTAAGTTCCGGTTTTTTCATTTTAATGCGTAATTGCGGATATTTATCTTTC 
ACCCTATCTATTAAGTCATATGTTTCTTGAAAATGTAAATCAGTGTCTAAA 
AATACAATTTGTGCGTCGGGTTTAATTTGAGAAATCAAGTCAATCAAAACC 
ATACTCTCAGCACCAAAACTGCAAGAATAAACAATATCATTTTCATAAGTT 

40 TGATATGCCCATTTTAAAATCTCATAAGCACCTTTTGTCTCATCATTAATA 
TCTAATTCATTTATAAATGGATCGTTTTGAAAATTGTCATAAGTAATTCTT 
TCTATAGACATTAGCGGAGCCCCCTCTCACCTTATCAATCCATCCAAATCA 
ACAAAACATATCAGAATTGTCAGAATAATTATGTTTAATACTTTAACTACT 
TTTATAACGAAAGTCAAGGTAACTCGAGAAATTATCGAAATAAAGATAAGA 

45 TTAATTTAAATTTTTTGTCCTATATGCTCATAAAAAATTTAAAGAGCCTAC 
ACATATGATATGGCAGACTCTTTAATAATTTCTTTTACATTCGTTTATAGT 
GACTTCATTCAAAATATTACAAATGGAACAAGATTATTTTATTCTTTAATC 
CAAGAAGAATATAGGCATTTATTTTAATAATTGTCTTATTAATTTTTCCAT 
TTGGTTAGGATTATCACAAGGTAAAAATCGTTTAACAATATTTCCCTGTTG 

50 ATCTACTACAAATTTAGTAAAATTCCATTTAATTTGCGAACCAAATAATCC 
TGGTTGTlTACATTTTAATAATGTGTACAAAGGGTGTTCATGCTCTCCATT 
AACATTAATCTTAGCATGGATGGGGAAAGTAATACCAAACTTATATTTATA 
TACTCGATAAATATCTTTGATTAATTCTGGCTCCTGATTATTAAAATCATT 
GCAAGGGAAGCTCAAAATTTCAAGACCATACTTATGATATTTTTTATAAAG 

55 CATTTCTAATTTATTAAATTGATCATTCAGAGTACAATTCGTTGCAGTATT 
TACGATAATGAGCACTTTCCCTTTATATCTTTTTAGTAAATAGGTAGAGCC 
ATCATAATTCTTTACCGCAATATCATAAATACTCATTTAGATACACCTCTA 
CATCATTGCCAATTTCAATAAATCTATAACGCCACTAGAAGCTCCAGGAAA 
TGCAAAT^TATTTTTATTTAAGTCTTGTGCAGTAAGTTTTTGATTAATGA^ 
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AAAGACTAGGAGATTAACCAAATCACCAGCGTCATTACCATAAATCTCTGC 
TCCTACTAAACGTTTGTGACTATCTA 



5 Sequence 3358 

step . lOOlhO? . cons . ok 

TAGGCTAGTGGTACTAAAACACGGTCTTTATAAT6TTTGAATCTGTTGTAA 
TAAAAACTATCTTCTCTATCTGCAAAATCTTTAGTTTCAGAGGTATCCTCC 
ATAAATGACCTT^TATAGGTAACTCTTCTTCAGATAAAAAGCGGACTTTA 

10 ACTCCATTTTTCTTAACTTTTTTAGTATTACGCTTTCTTAAACCATCCATG 
TTTTTTAAAACATCATTAGCACTTTTGTTTGCTAAATTTAGAACAGAATGA 
TATCGGATTTGTAATACAGGATCAAATCCTTTGTGGAATCCTTCGTGTTTA 
TATCCTAAACTCTCTAATTCATCAAAAATCCAATCATGACCTGCATTTCCA 
GTTATTTCTCCCTCATGATTTAAATATTGATATGGAAGGTATGGGTCAACT 

15 CTTAAATATAAACAATTATATTTTTTTACATATTTACTCAATTCATTAAAG 
AAAAAATGTACAAGCTCTTTATTATTATAATCTATTACTGGACCGCGATTG 
GAATAAAAATATTTAAATATTTTCATTACAGGAACAGCTGTTAATAAACAA 
GCTGCAATCACTTCGTTATCATTATTTTTAATTCCAACTAAATGTGACTCG 
GTACCTTCAGCAACCTTTAATTCGTAATTACCTTCCATTTGTGTAAAATGA 

20 CTATATGTCATACGATCAGTAAAGTCACTAAATTCTTTAGCTGTCAAATTC 
GTAAACTTCATCTTCATAACTCCCATTTATATCTTTATATCTATTTCTTTT 
TATTTTTAGAAAAATAGGTCGTTTTACTTCACCTTTTATATTTAACATTTG 
GTTCATTTAATACTATTTATATTTCTAATCTAAACAAGTATACAATAGTTG 
TAGTG7UVAACTAAAATATTATCCTGCAAAACTTTTATATAAAATTAAAAGT 

25 TAAATCTCGTTTATTCTGTTTATGTATAAAATAATATGGGCTAAACACTAA 
TGTCGCCCATATCACTAATTATTTACTATTATTTTAATTCATTTGCAAGTG 
CTTTACCAATGTCTTCACTTGATTGGGGATTTTGTCCAGTGATAAATTGAC 
CGTCTTTTTCTACGTGAGATGTAAAGTCATCTTTCACTACAAAATTTGCAC 
CTTGCTCTTCTAATTTAGATTGAGTTAAAAATGGTACTTTATTTTCAAATC 

30 CCATTGCTTTTTCTTCACTATCAGTAAATGAAGTTATTTTGACACCATCTA 
CTAGATAGTGATTATTTGCATCTTTTACACCTACAAACGCACTAGGTCCAT 
' GACATACTGAAGAGATTATTTTATTACTATTTTTAAATTGAAGTAAAATAT 
CAGCTAATTTCTCATTATTGGCAAAATCGTATACAGTACCATGTCCACCTG 
GTAGATAAATAGCGTCATACTCATCTGCATTTACATTTTCGATACTAGGTG 

35 TATCGTTTAAGTGTGACACGAATTTAGCATACTGATTCAGTGATTCATTAG 
AAACAGAATTAGGATCAAGATTTACTTTTCCACCTTTAATAGAAATAACAT 
CAACATTGATACCTTCTTCAGTCAATATATTATATGGTGCTCCAGCTTCTT 
CTAACCATAATCCAGTTTCTGTACCGTCTGTAAATTGACTTGTACTTGTTA 
AAACAAATAAAACTTTTTTACTCATGAATATATACACCTCTTCAAATTAAT 

40 TGTGATTATTATTTACCCACATTAGTAGTAGATTATTCAATAGTGATCATT 
TAATTTTTTGAAGTTACTATTAGTATAGTAGTTAATTCATAGAGTAATCAT 
TGTTTTTGATATGTATATTTTTCATCTATTTTCTTAACTTTGATTTAACTT 
AATATTAATTTTTATAAAGAATTCAACGTAGTTCTTATTGATTTGATATAA 
GTGATAAATTCTTTTCTTGAATCTATTTCAATACGTTTTACAATTTCACTA 

45 CCAATTACAATACCATCTGCAACGGACGCTATATCTTTAACATGTTCAGGA 
TTTTTGATACCAAATCCAGCAACCACAGGAATTTTTGAAACTTTTTTTATA 
TATTCAATTTTTCTCTTTAAATCTGGATGGAACTCCCCACTGTTACCTGTT 
GTGGCATTCATTGTTACCGTGTAAATAAATCCTTCTGAGTTCTTTGCAATT 
TGCATAATCCTAGCATCACTTGCGGTCATGGCAATTAACGATATTATTTTA 

50 ACAGAATGATGATAAAAATCTTTTTTAAACTTTTTTGTAAGTTCGTAAGGT 
AAATCTGGAATAATTAAACCATAAACACCAGCTTCATCACACTTATCCAAA 
AATAATTCTTCTCCATAAGCACTTAGAATATTATAATT^GTCATTAATACA 
TACTTAGATGAAATAGTATTTTTATTTTTTATTAATTCATCAAAAATGAAT 
TTAATGTTTGAACCCTCGTCAATAGCGTTGCGCCCTGCTTTCATGATTATA 

55 GGTCCATCT6CAACAGGATCAGAAAATGGCACACCAATTTCAACAATGTCT 
GCTCCATTCTCAGTTAATGTTTTTAAATGAT6AATAAAATTTAAATCACCC 
ATAATATACGGAATGAATAATTTACTCATTTTGTTCACCACCGTTTTCTTT 
GTATTGTTTAATTGTTTCCATATCTTTATCTCCACGACCTGAAATAGTCAC 
AACAATAATTTCTTTTTCATCCATATTTGGCGCTAATTTTTCAACATAACT 
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CAATGCATGTGCACTTTCAATTGCTGGAATGATACCTTCAACTTTTGAGAA 
TGTTATAAGTGCTTCCATAGCTTCATTATCTGTAGCACTTACATATGATAC 
ACGACCAATATCATTATAATACGAATGTTCAGGTCCAATACCTGGATAATC 
TAGTCCCGCTGATATAGAGTGTGCCAATTCAATTTGTCCATCATCATTTTG 
5 AATAAGGTACATTTTGGAACCATGTAATACACCTGGTTTACCTTTCCCTAT 
AGCTAAAGCATGATTATGCGTATGACTTCCTTTTCCCGCAGCTTCTACCCC 
ATATAATTTAACATCATCTTGTATAAATGGATAGAACGTACCTATTGAATT 
GGATCCTCCACCAACACACGCGACTAACGCATCTGGAAGTCGTCCTTCTTT 
ACTT 

10 

Sequence 3359 

step . 1001h09 . cons . ok 

CACAACTGAAGCTTGTAAACCTGGCTTAATAATTGTATTTGCTCCTACTTG 

1 5 TTGATCGTATTGTTCATATAAATGATGTTTAGATGCTATTGTTGGGTGTTG 
TAATAACTTGATAAATGTATCATGTACATCTATATTGCTGTAATCGTTTTT 
TGATGTATTATATTCCTTTTCTTCTCCTTCTAAAATATACACTGGTGCCTC 
ATCAGCAAGAGGTTGCACAGGAATGTCTGCATACACTTCATCATCGTATGT 
AAGCACGAATCTATCAGTGTCTGTTACTTCACCGATTACTGCACTATCTAG 

20 TTCGTGTTTATTGAATAT^TCTAAAAACTTTTGTTCTGTACCTTTTTCTAC 
AACTCTTTAAGCCCATGTTTTTTATAGGTTCTGTTGCAAAGTAAAAAAATA 
TAGCTAACCACTAATTTATCATGTCAGTGTTCGCTTAACTTGCTAGCATGA 
TGCTAATTTCGTGGCATGGCGAAAATCCGTAGATCTGAAGAGACCTGCGGT 
TCTTTTTATATAGAGCGTAAATACATTCAATACCTTTTAAAGTATTCTTTG 

25 CTGTATTGATATTTTGATACCTTGTCTTTCTTATTTTAATATGACAGTGAT 
CTTGCTCAATGAGGTTATTCAGATATTTC6ATGTACAATAACAGTCAGGTT 
TAAGTTTAAAAGCTTTAATTACTTTAGCCATTGCTACCTTCGTTGAAGGTG 
CCTGATCTGTAATTACCTTTTGAGGTTTACCAAATTGTTTAATGAGACGTT 
TGATAAACGCATATGCTGAATGATTATCTCGTTGCTTACGCAACCAAATAT 

30 CTAATGTATATCCCTCTGCATCAATGGCACGATATAAATAGCTCCATTTTC 
CTTTTATTTTGATGTACGTCTCATCAATACGCCATTTGTAATTAGCTTTTT 
ATGCTTTTTCTTCCAAATTTGATATAAAATTGGGGCATATTCTTGAACCCA 
ACGGTAGACCGTTGAATGATGAACGTTTACACCACGTTCCCTTAATATTTC 
AGATATATCACGATAACTCAATGCATATCTTAGATAGTAGCCAACGGCTAC 

35 AGTGATAACATCCTTGTTAAATTGTTTATATCTGAAATAGTTCATACAGAA 
GACTCCTTTTTGTTAAAATTATACTATAGATTCAACTTTGCAACAGAACCT 
ATATTTTGCTATAATTATTAAATTAATTGCCTTTGTGTTAGTATTCCCTGG 
ATTACTAACACTTTGGTTAGCTGTTCTAAGTGATACAGGTGCAGCAATATT 
AGTTATATTAAATTCATTACGTTTATTAAAAAAATAAAAAAGCCTAAATAC 

40 TCTCTCGATTTGTGAGAAAGTATTTAGGCTTTTTGAATTAATAAATTTCTT 
ACTTCTTATGTTTATTTAACAATTAGTACTGGGACAGAACTTCTTTTAGCA 
ATCTTATGTGTAACATCTCCTAAAACATGTTTAATATTCAAATCGACTCTA 
TGATTACTCATAATTACTAAATCAAATTCACCATTATTTATATCTCCAGAA 
GTAATTTTTTCTAAGATTGTATCTTTAATATGCCCAAAATCTACATCTATT 

45 TTATATTGAATATCTCTTTTTTCTAATTCACGTAAAAATGGAGTGAGTTTT 
TTCTCTTTTTCTCTAATAATATCTTCTTTATGCTTATTATAATACTTGACA 
CTAATTGCAAGATCATTTTCTGTTACAACGTGATAAATTACAACAACTGAA 
TCTTCATCGGTGACTTTTTCTAAATAGTCAGGGATAGCACTAAAATCATTT 
TCGAAATCATAGGGTAATAATATATTTTTGAACATCTACACCAGCCTCCTA 

50 TTACTTATTATTTAGATGTCTATTACTTATATATACCCTAAATATGTATGC 
TTATTCTCATATTCAAATTATTCAAATATATTCCTTATCATGTATTAGATT 
CTATTTTTTATATTATTTTTGTAATTCTTTGTATAAACCTATTTATTTATA 
AAGCGCTCTAATACATACCTAAATATCTTCAACTTATAAACTACTTTTTAT 
TTCAATGCCTTTTTCCAAACTTCTTTTATGAGATATTGTGGTTATTACCTA 

55 ACTCTCATTATTAATGTTATTATTTTATTTATACATAGTAAATGTAACTGT 
AAGACATTCTCACCTAACATGTTTTGTTATTACCTGACATTTACAATTTAC 
ACATTTCGCTATTATTACGCAAATTATTTTGATTTTATAATTATGATGACT 
ATAATCTATTCTAATTAATCTCTCCTTTTAGGCATAAAATAATCTACTTAT 
AATTAAAGCATCCACTATTTCAAAATTATTTGTAAATTGAGATTTATAAGT 
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GATTTTTATACAGTTATTTAAAATCTATTTCATAATCAAAAATAATAAAGG 
GGGTAATAAAAATGAAAAAAGAAAAACGTTTAAATCTTATCTTAACTGTTA 
TACAACAGAATCAGTTTAATAAAAAACAACAAATAGTGGATTACATGGCAA 
GACATTTTGGAGTTTACTATAGCTTGACAACTATTTCTCGTGACTTACAAG 
5 AATTAGAAATTTACAAAATCCCTGTTGAAAATAAAAAGTATATTTACAAGA 
AAATAAATCAAACAAATCAATTAAGTGCAAAAAAACAATTAGAAATATTTA 
GTGATGAGATTATTGAATTTATAACGCTAAATAACTATGTCTTAATAAAAA 
CATCTCCTGGCTTTGCTCAAAGTATAAGTTATTACATAGATCAATTACAAA 
TGAAAGAAATATTAGGAATTATTGGAGGTAACGATACTTTGATGATTTTGA 
10 CTTCTTCAAATGAAATAGCAGAATTTGTTTGTTATCAATTATTCCCTTAAA 
AATTAAATGATTATATTAATAGGAATGATTACTAATACAAAAAAATGAGTA 
AAAAATCTTTTTGCTTGAATAAGATAAGGTTTTAACAATGAGATTTT 



15 Sequence 33 60 

step , 1002a01 . cons . ok 

AACACAACTCTTATTAACTTTTGATAAGATTTTATAGTATTAAAGTTCCAT 
TCTTTACGAGTATTCATAGCTTCTAAATAATACCCAGTTATAATATCAAAT 
GCTTTCTTCTCTAAGCTCTCTTCTCTATTTGAAACATGATAAGGAGCACCA 

20 CCTTTAGTAACCCACTTATCTTTTTTATAATTAGATAAATAGGCAGTATAT 
AAAGAGATAATGGTCTCTTTGAAACCGAAAAATTTATACTCTTCTTTTAAA 
GAATCTTCACTAGCCACTATAGAAAATAGATGAGCGTTAAAAACTGAGGGG 
TTATAACTTTCTGATTTAGGTGGAGCACCACCACCACCAGAATTATCAATA 
GGCAAACCATCTTTATATTCTACTCCAAATAATTCTCCGATTTTAGCGCTC 

25 ATCTCATCAGCATCAAATTCTATACTTTTATCATTTTCTTTTTCAAACTGT 
CTTAAATACTCTTCCTGATCTTTTTCAATACCTTTACGTGTTACTTTTATC 
GATCCACTAGGGAATTTGAATTCACCCACTGTAAAATCAGTCAAAATTTTT 
TTTATTTGTTCTTTAAATAAAAAAATGATGATAAGAAATAAAACTGGAAAA 
GAAAAATTCTGATACAAAAATTTAACGAAACTAAAAATTGATTTTAATATC 

30 TCATAAATACACTCCAAAACTTACCTCCAAAATTTTAATTGACTTTTAGAT 
TATTACACTAAGAAATCATTAATAGTAAAAACATATATTTATTCTAATTAC 
TTTATTGACGTTGATCCTCTGAACCCTTAACAAATCCAAAACTTGTCGAAT 
GGTTGGTTTAATAACTTACGCTATGCCGACATTCGTTTCAATCTGTTTGAT 
AAGTTAAAATTTCATATATAGTCGTGAATTTCAATGTTCTCTTTAAAACAA 

35 GGATCGTGAAAATATGCCTAAAAATCAAAACTAAATGATTCATTAGTCTTT 
AATTATTTTCTGTTACTTATAGGTGTGATAACCTTTTGTGCTTATTATTTT 
TTAGGTGAAGTTAATGTTAATATAGATTTATAACGTTTAAATTGCATGTTT 
TAAATTGTATTTTCATGTTTGCATCTCTCTTTTTTTATTTATCTCTTAGTC 
AAGATTTAAAATCTTAACTAAGAGATTTTTGTATTCTTTACATAATAAAGT 

40 CTTATGGGTAGTTGAATTGTATACTCTATTCATTTACTGAATTATATCAAT 
AATTTTATTCATCTTATCTTTATCACGTGATACTACGCCTACAACTCTTTT 
ATAACTTTGTAATTTTTTTTGATATAAGTTCATACTTTCAAGTCTTGTAGT 
CAAAGATTGAGGAGTCAAAGTAATACCTTTATTTGCTTTGACATTAGCTAA 
TATAGTTTCATAATAATCAACATAATCAATATGTAACTTATCTTTCAAGGA 

45 ATAATTTTTAAGAAGATCATATGTATCACATGGCGGTGTTTGTATATAAAT 
ATTCTCATTTTTTAAAGATTTTATACTTACTTGGTTAGTGTTTTTAAACTT 
ATTTTCATCACCATAAGCCACAATAAAGTCCTCAGTGTAAATTTCTTGTGT 
ATAGAGCGAGTTATTTTTTAAATGTGTAATATCACCAATAACAACATCTAT 
ATTATCTTTATAAATTTCCTCAAGTAAAATTGAAGTACTATTTTCAATAAC 

50 TAAAACGACACTTTCATTAATATAATTCTTTTCATGTAGCTTATATAAAGA 
AAAGCTAGGTATGACTCCTAATTTAATTCTTTTATCTTTAGACTCACTTAT 
TTTACTTTTTAATTCTTCAAATTGCTCGTTTATTTTTAAAAACGAATGATA 
AACAGCATTTCCTTTTTGAGTTAAAAAAATACCCTTAGAAGTTCTATAAAA 
TAACTGACAATCAAAAT7VATCTTCAATACTTCTAATTCTTTTACTTAAGGC 

55 TGGTGTACTTATATTCAATTCTCTAGAAGCTTTATTTAAGCTATTGTACTC 
GACAACTTTAATGAAGTCCTTCATATAATTAAAATTCATAAAATCCCCCAT 
GGTATTAACTATTTGTTTATACCCATATTAATACATTACTTATTCATATAT 
TTCAAGTAAAGGTTTATCCTTCTATTGTTCCCAAAAATAGGCAACTAATCT 
CTTAATGAATCAAAATTGACTGATAGGATGAGATAATAATGAAAAAAATAT 
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GGATTTTAACTATAGGTATGTTTGCCTTAGGTATGGATGCTTATATAGTAG 
CAGGATTAATACCTTCAATAAGTAAAAGTTTTAATAAAAGTAGCTCTGCTA 
TTGGGCAAGGAGTAACAGTTTTTACATTGTTTTTCTCTATCTCTGCCCCCA 
TTTTTTCAACAATATTAGCTAAATCCCCAGTTAAAAAAATACTAATAATAG 
5 CATTCAGTATATTTACTTTAGCCAATATTATAACCGCAATATCTATGAACT 
ACATGCTATATATCGTATCAAGAGCAATCGCTGGTTTAGGAGCTGGCGTAT 
TCTCACCAATTGCT^TAAGTGCAAGCAATCATTTAGTCTCCGAAAAGCATA 
AAGGAAAAGCAATCGCTTTTACAGTAGGCGGAATGAGTGTAGGAACTGTTA 
TAGGAGTTCCTCTCGGACTAGAAATTGCCAACATTTCTAATTGGCGATTTG 

1 0 C AATGTTGGTTATTATTGTCATTAGTTTTATTGC ATTAATAAGCATATCTA 
TATTGATGCCTAAATTTAAAATAGAAGCTCCTCCAAATTTAAAAGATCGTT 
TTCAATTATTTTTAAACAAGCATGTACTAAGAGTTATTTCGGTTACATTAT 
GCGCTGCCATTGCTAGTTTAGGTTTGTATACTTATTTAGCCGATATTATTA 
AAACAAATACAGATACAAAAAATTTAACTCATTACCTTACAGCGTGGGGAA 

1 5 TAGGCGGATTAATAGG AAGTTTTGGT ATAGG ATTTATTATAGATAG ATTTA 
AAAATACAAGATTTGTTATGCTAATTATTTTAATTTTACTAGCATTAAGTT 
TTGGTTTAATTCCTATTTCTATTAACTTGCCTATATTAGGTTTAATTCCCT 
TTATTTTATGGGGAGCTATGGGATGGGCTACACAAGCTCCTCAACAACATA 
TATTATTGAAAAAACATCCTGAATATGGAGGCTCTGCTGTCGCTTTAAATA 

20 GTT 



Sequence 33 61 

step. 1002a02 .cons .ok 

25 TTAATTAATTTTATTTATTTAATTAATATTTTGTAAAATTAAATACAATAA 
TACATATTTATGATATGTTTTTAAATGTAATAAAGTCAAATAAAAGGAGGA 
AATTTATGAAAAAGTTAATGATGAGCTTATTATGTTGCACAATATGTGGAA 
CAGTTTTAACTACAGGAAGTGTAAATGCACAAGGTGATAGCTCAACTAACT 
CTGAGTCGTTAAAGGAACTTCAAAATGAAGGTATTGTGTCTAAATCAATTA 

30 CTGAACAACAATGGCAAC7ULATGAAAGCTCAAGAGC6TAAAGATGAAGCAG 
AATTTGAGAAAACAGCTGAAGTACAATGGCAGAAACAACAAAAACAAGATC 
GAATAGATCGAGAAAATCGTTCAAGAAAAAAGAAATTTCATTTGAAAAAAG 
GTGATTTTTTCATCACAAATAACGTAAGTTCAAAAGGGTTTACAGGTCATG 
CTGCAATATATACTGGAAAAGGAAAAGTTAAAGAAGCGCCTGGATATGGAC 

35 AACCTGTGAGGGTAAAAAGTTTTAGTGATTGGAAAAAGAGTACTTTGAAAA 
AAAGAAAAGGAAGTCCCAAACATCGTTATATCAAGGTTTATCGAGCACCAA 
AAAAATATAGAGGTAAAGCTGGAAACTATGCGAAATCTCATTTTAACGGTG 
TACCTTACAGTATAACGACGAATCCATATTCTAAAAGCGTTACATACTGTT 
CGAAACTTGTTTGGCAATCGTATTATTATGGTGCCGGACGATTATCAGTTC 

40 TTCCGGTTGTTACATCTCAATTTATTATTGAGCCATATAGTCTCAATAAAT 
ATATTCCATCAAAAGCAGTTAGGTCTTATAAAAGAAGCTAAAAAATTAAGG 
TGTTAAATATATAATTTATACATTTTAATTACCCATAATCGTTAATTATGT 
TACTTTTCTTTGATTTCATCGTTTATTAGAACACTTTGATAATACATCATT 
TATGAACTTCAGTACATATTATGTGATAATCGACCATTAAAAAATACGATT 

45 AAATGATTAAAAGAACAATAAAAAAGCAAGCGACTAATAATATGATTATTA 
GTCGCTTTCAAACTGTCAATAAGCGCTCTTTTCAGATACTTTATTTATCTA 
ACTCTACTGTTTTTATCTTTGGTATTTCTGGAAGAAATGCATACTTAATCT 
TTATAAATAATGAAGCTGGCTTTCCATTGATATCAATACGATAATAACCTA 
TATGCCCACTTCCTTGATTATCAGATTTTCCATCTATAGAAATTTTATTTT 

50 GCTTTTTTAAGAATTGATACGTTTTATTATCTTTAGCAACCTCTTTTAATT 
TCTTAGTATCTTCATGCTTAACATAATTATTTAATTGCTTTTCACTTGAAT 
GACACGCCCATGAATCAAAATAGAAAAAGTTTACAAAAATATAACAAATTG 
CTAAAAAAATTATTATACCAATAACATAGAATACTTTTTTCATTAAAGTTC 
ACCTAGTTTTCAAAAAAATTTTTATAATTTATGATAACTTAAAAGTGTATA 

55 TTTTTTCTACATAATATATTTAATTTTCAAGGAGATGTAAAAAAGTTGAAA 
AATTTCGCAAAACTAATTTTGGTGGGCATTTTAGTATCGGGGTCAGGGATA 
GCGAGTGTACAAACAAATATAACTCACGCAAAAGAAAGTCACGATTCAACT 
CCTCAAAATATTAAATTAGTGGGAACGTATGATACTTCTCAAGTTGATTCC 
AAAACGATGAAACAATTTAAAGAAATAGAAAAAGAAGATAATAATTTCCAC 
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ATAACTAAACATGGAAATAAAGTCX3TTGTAGAAGACAAATTACCTAATCCA 
GAGAATAAAACTTCAAGTTATTCAGCTGATGGTAGTGCTGAAAATAATACA 

AAAGTAATTAATTTCTCTGATTTTGTTGGTAATATGGATGGGAAAGATGAT 
GGAAAAATATCGGATGGGATAACCTTTTATAGTGGTAAATCATATAACGGA 
5 CAACACGATGGTCAAAAAGTAAAAAAAGGGACTCATGTACATTGTAATAGA 
TTTAACGGAACAAAATCTGATCATAGATACTGGTCAAAAAAACATCCTAGA 
GCTTATGTAGATTTTTATAAAAGTGATTGCTGGTATCACGCCAAAGCTTAT 
AAATGTTGTTCCTTGGGAAAAATGACTAAATGCGATGGTTTGAATAGTATT 
TATAGAAAAGGTGTCAAAGATTGCTCATCATGGAAAGGTAAACCCAAACAT 

1 0 AAAAACTGGCCTAAAAC AGCATGGTATAGAAATTAAAAAATGAATAAAATC 
TTAAAAATATTAATAACTTCTATTATTGTTATCATTATTACCTTAACAGTT 
TGGACTTTTAGTGTGATTACTTATCAGAAACACAAGAGTGAGAAAATCATC 
AATCACGTTATAGAACGTAAGGGTTGGGATAAAAAAATAAAAAATGAAAAA 
ATGAGTTTTAATATTATAATGGGATAT6CTGAAAAAGATATTGTTTTTAAA 

1 5 GATCAACCATATAGTGAGTATGAGTATAACGTGACACCAGC ACCATGGAC A 
GATGATAAAGAATATAAGGTGTGGGGGGAAACAGATTTACAAAAGAAAGAC 
TCCTATTATAAATATCTTTTAGAATCAGAACCTTACAGAAAATAATATAAA 
TGTATAACATAATAAAAAATAGAGATCATTCATGGACTTCTAGTAAGATTC 
AAGGTAGAAATACAGATGGTGGATTAGAATGGTCACCAGATCATAAATCAC 

20 TTATTTATAAATATGATGCAACATTAGGTAGACAAATAAATACTAATGACG 
TGTTAACTTTACTTCAAGCAACAGCTAAAAACTCAAATTTACGTTCAAATA 
TCAATAGTAATGAAAAACAGTTAGCAGAACGAGGGTCTAATGGGTATTCTA 
AATCTATAATTAGAGATGATGGCGAGAAATCTTATTTACTTAACTCAAATC 
CTATTCAAGTATTAGACTTAGTAGAACCAGATAATGGTTACGGTGGACGTC 

25 AAGTCAGTCATTCTAACGTTATATATAATGAAAAAAATTCTTCTATCGTAA 
ATGGTCAAGTTCCAGAAGCTAATGGGGCATCCGCTTTTAATATTGATAAAG 
TTGTTAAAGCTAATGCGGCAAATAATGGTATTATGGGTGTTATCTATAAGG 
CACAATTATACTTAGCACCATACAGTCCAAAAGGTTACATTGAAAAATTAG 
GCCAAAATTTAAGCAATACCAATAACGTGATTAATGTTTATTTTGTGCCTT 

30 CTGATAAAGTAAATCCTAGTATAACTGTAGGTAATTACGACCATCATACGG 
TATATTCTGGTGAAACATTTAAAAATACTATCAATGTAAATGATAATTATG 
GATTAAATACAGTAGCTTCTACAAGTGATAGTGCAATTACTATGACCAGAA 
ACAACAACGAGTTAGTAGGTCAGGCTCCTAATGTTACTAATAGCACAAATA 
AAATTGTAAAAGTTAAAGCCACAGATAAAAGTGGAAATGAAAGTATTGTTT 

35 CTTTCACAGTAAATATAAAACCATTAAACGAGAAATATAGAATAACAACTT 
CATCAAGTAATCAAACACCAGTGAGAATTAGTAATATTCAAAACAATGCTA 
ACCTTTCAATTGAAGATCAAAATAGAGTAAAATCTTCACTCAGCATGACTA 
AAATTTTAGGTACAAGAAATTATGTCAATGAGTCAAATAATGACGTTCGTA 
GTCAGGTTGTAAGTAAAGTAAATAGAAGTGGGAACAATGCTACAGTTAATG 

40 TTACAACTACATTTTCTGATGGTACAACTAATACAATAACCGTTCCAGTTA 
AACATGTGTTATTAGAAGTTGTACCTACTACTAGAACAACAGTAAGAGGAC 
AACAATTTCCAACCGGCAAAGGAACTTCCCCAAATGATTTCTTTAGTTTAA 
GAACGGGAGGTCCAGTTGATGCGAGAATAGTTTGGGTTAATAATCAGGGAC 
CCGATATAAATAGTAATCAAATTGGTAGAGATTTAACATTACACGCTGAAA 

45 TATTCTTTGA 



Sequence 3362 

step . 1002a06 . cons . ok 

50 GGTGCGATATATACTTTATTTCAACCATATTTAAATACTATGATGCACTAT 
TGGTTAATCAAATAAGATTATTATGAGAAAAATAACAATTAAAATTACGTA 
TACATAAGGAGCGTTATCATGTTAGACAAAAATCAATTGGAAAAGTATAAC 
CAAGAGCATTTGTATGAATATGAAAAATTAATGAGTAGTAATGAAAAGAAT 
GCTTTAGATGAAAAAGTAGATCAGTTAAATCTTGCAGAAATTCAAGATTTA 

55 TATCAAGATTTATATGTTAATAGAAAAACTATTGATGATGTATCTTCTGTA 
TCTGAAGTCAAATATGAAGTGAAATCACGACTCAATGAAGAAGAACGACAT 
ACATATGAACAAAAAGGTTATGAGGCAATACGAAATGGTGAATTTGCTGTA 
TTATTGATGGCTGGAGGACAAGGTACGCGTTTAGGATATAAAGGGCCTAAA 
GGTTCTTTTGAAATAGAGGGTACGAGTTTATTTGAACTTCAGGCGCGTCAA 
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CTGATTCGTTTAAAAGAAGAAACCGGCCACACAATTAATTGGTATATTATG 
ACAAGTGACATTAATCATAAAGATACAATAGAGTATTTTAAACAACATAAA 
TATTTTAACTATGATGCCAATCATATTCATTTCTTTAAGCAAGATAACATT 
GTTGCTTTAAGTGAAGAAGGAAAGCTTGTTTTAAATAGAGATGGACATATA 
5 ATGGAAACACCTAATGGTAATGGGGGTGTATTCAAGTCTCTTAAGAAAGCA 
GGATACCTTGATAAGATGCAACAAGATCACGTCAAATATATCTTCTTAAAT 
AACATTGATAATGTCTTAGTTAAAGTTTTAGACCCGTTATTTGCCGGTTTT 
ACAGTGACACAAAGTAAAGACATCACATCAAAAACAATTCAACCTAAAGAT 
AGTGAAAGTGTAGGTCGGCTTGTAAATGTTGATTGTAAAGACACTGTGTTA 
1 0 G AGTATTCTGAATTAATGTCATATTTCCAGGAGCTGC ACTCCAGTTCATTT 
CTGAAACTAAAATACTTCCATCGCTATTCACGCGCTCTACAAACGCTACGT 
GACCATAGTAACCAGCGTCAGTTTGTGCAATTGAGCCTACTGTAGGACGAT 
AATCAATAGTATATCCATCAGCAGCTGATGCATTGTCCCAATTA 

15 

Sequence 33 63 

step . 1002a08 . cons . ok 

TTACCTCCATTTAAGATTCAAAGTGATATGTCCAAAATGGCTCTATTTTTA 
AGAAACTTTGTGCTTGACCATCTATTTCGGGACTAGCAAGTTGTTTTAAAA 

20 ATGGTCCTGTTTGTGAAGCATGTGCTTCAAAAGCTTTAAGTTTTATATCAC 
TATATTGAGATATGTCATTTTGAATATCAGGTTCTCCAAGAATTTCTGATG 
CATCATTGCTAAACGCTACAAGTGTAAGACGGGGTCGATCTGATTCATGCA 
TGCGTCCAACTGTACGTACTACAGCTTCTGCAGTTGCCTCGTGATCAGGGT 
GAACTGCAAATTTAGGATAGAACGAAATAATTAATGATGGATTTGTTTCGT 

25 CAATAAGTGATTGAATCATTTGATCCATTTGATCGTAAGGTTCAAATTCAA 
CAGTTTTATCTCTTAACCCCATTTTCCTTAAATCTGTAATCCCAATTGCTT 
TGCATGCTTCTTCTAACTCACGTTCACGTATAAATGGTAAAGATTCTCTTG 
TTGCAAAAGGAGGGTTACCTAGATTACGTCCCATTTGTCCTAGGGT/y^GAC 
ATGCATATGTGACGGGAATACCTTTTTCAATATAACTTGCGATAGTTCCAG 

30 . CAGACGAAAAAGTTTCATCATCAGGATGGGGGAAAATCACAAGTACGTGTC 
TTTCATCAGTCATGATTGGCAACCTCCTTTATATAGTAAATGGCTGTTTAC 
TTATTTCTATTGTAGCAGCCAATTGTCCTTCATAATTAAAACCTGCAATTA 
AAAATTCGTTATTCTCATTAACTTCATAGTGCGTAAGTCCTTGCACGTAAA 
CCCAGCCACCATCTTTTAATTTAAGACCTACACGATATGGATCTTTGTCGC 

35 CGCCTTTAAGTTGTGCATGTTCAAAAGTCACGACAATATTTCTTAAAAATG 
TTCCAGCGTTAAATACGCGTTGATCGAAATGATTTGCATAAGCACCATTTG 
TTGTTTCAACGTGTAGGTAAACAGGCTGATTTGAGTAAGAAGTTAATAAAT 
CTAGCACTTCTTGTTCTTTAATTGGTTCCAACACGTTCACTCCCTTAAACT 
AAACAATATGTCAGTGACGGACTACCGATAAACATATTATTTACCAGCAAA 

40 CTCGTTTATCTATCTATTTTACTAGAAAGCTATGTATATTTGTACACGAAT 
GCTCGATTTAATTAGGAAAAATTCAAGAAATAACGTTGAATATTACATATA 
GAGCGTTAATATGTTTAAAATCTTAATCATTTGTTACAGTATAATTTAAGA 
GATATAAAAGAGTAAGGCGTTAGTCATCAATCATTGTGATATTGAAATTAG 
AGGTGTTTGATATTGAAAATTATTAATTTAGATTCAAAAAATCTTGCGTCA 

45 TTTTATGTAGCATGCGAGTTGTTTAAACAAATACAGCAGCACCCTCATGCC 
AAACTCGGTTTAGCAACTGGTGGAACTATGACTGACGTATATCATTACTTA 
GTAAATTTATTAATCAAAAATAAAGTTGATGTGTCGCAGGTTGAAACATTT 
AATTTAAATGAATATGTAGGTTTAAAAGCAAGTCATCAACAAAGTTACCAT 
ACCTACATGAATAAAGTGCTATTTGAGCAATATCCTCATTTTGTTAAAAAT 

50 CACATTCATATTCCTGATGGCTTATCAGAAAATCTAGAAGCAGAAGCGGAA 
CGATATAACAATTTATTAGATGAAAGAGGGCCAATCGATATTCAAATTTTA 
GGAATTGGAGAAAATGGTCACATTGGTTTTAATGAACCAGGGACTGACTTC 
AATAGTGAAACACATGTGGTGAACTTAACAGAAAGCACCATAAAAGCAAAT 
AGTCGATTTTTTGACAATGAAAAGGATGTTCCTA6ACAAGCAGTTTCAATG 

55 GGGGTAAAAAGTATTTTAAAAGCAAAAAGGATTATCCTACTCGCATTTGGT 
CCAAAGAAAAAAGAGGCTATAAGTAAACTGTTAAATGAACAGGTTACCGAA 
GATGTACCTGCGACCATTTTACACACACACCCTAATGTTGAAGTTTATGTA 
GACGATGAAGCAGCGCCAGATTGTTTATAATTATATAGTTGTTTTATATAC 
AAAATATTTTATATGTTTAATAAGTTTAGCGAAGGGTATACAACTATAAGC 



868 



wo 01/34809 



PCTmSOO/30782 



AAAAGAGAATTTGCCAAAACGAAAGGATGAATTTCATTGGAACTACAATTA 
GCCATTGATTTATTAAATAAAGAAGAAGCAGCAAAATTAGCTCAAAAAGTT 
GAAGAATATGTAGATATTGTTGAAATTGGTACGCCAATTGTAATTAATGAA 
GGGTTACCTGCAGTTCAACATTTAAATGAAAATATTAATAATGCTAAAGTA 
5 TTAGCTGACTTGAAAATTATGGATGCAGCAGATTACGAAGTGAGCCAAGCA 
GTAAAATATGGTGCAGATATTGTTACAATTTTAGGTGTTGCTGAAGATGCT 
TCAATTAAAGCAGCAGTTGAAGAAGCGCATAAACATGGAAAAGCATTGCTT 
GTTGATATGATAGCAGTGCAAAACTTAGAACAACGTGCTAAAGAACTAGAT 
GAGATGGGTGCAGACTATATCGCAGTTCATACAGGTTACGACTTACAAGCT 

10 GAAGGAAAATCTCCATTAGACAGCTTGCGTACAGTTAAATCTGTTATCAAA 
AACTCTAAGGTTGCAGTAGCAGGTGGTATTAAACCAGATACTATCAAAGAT 
ATTGTTGCTGAAGATCCAGATTTAGTTATTGTTGGTGGCGGTATTGCGAAT 
GCTGACGATCCTGTAGAAGCAGCAAAACAATGTAGAGCAGCTATTGAAGGT 
AAATAAGATGAGTGAATTTAATAATTATCGTCTTATTCTTGAAGAGTTAGA 

15 TTCTACTTTATCTCAAGTAGATAATACAGAGTATGAACGTTTTGCTAATGA 
TGTTATAGGTGCAGATCGCATATTTACAGCTGGTAAAGGTCGTTCAGGTTT 
TGTTGCTAATAGTTTTGCAATGCGCTTAAATCAATTAGGTAAAAATGCCTA 
CGTTGTAGGTGAGTCAACAACACCTTCAATTAAAGAACATGATTTGTTTAT 
TATTATTTCAGGTTCAGGTTCTACAGAACATTTAAGATTATTAGCTGAAAA 

20 AGCACAATCTGAGGGTGCAAAAATTGTCTTATTAACTACAAATGCGGAATC 
GCCAATCGGTAATCTTGCAGAGACGGTTGTTGAATTGCCTGCAGGTACTAA 
ACATGATGTTGAGGGTTCGAAACAACCACTTGGTAGTTTATTTGAACAGGC 
TTCACTTATATTCTTAGATAGTGTTGTATTACCTTTAATGGATGCATTTCA 
CATTAGTGAAAAAACAATGCAAGAGAATCATGCTAATTTAGAATAACTAGA 

25 ATGAGAGATGAGCACTT 



Sequence 3364 

step . 1002a09 . cons . ok 

30 AGTATCAGAATTGTCATTAGATGTGTCGTCATTACCACATGCGCCT7VAAAT 
TAAAGCACTACTAAAAATTAAGACTAAAAACTTTTTCATATGTAATTCTCC 
TTTACTATATATCTTTATATTCAAACACTCGTAACGGCTCAAATTGAATAA 
CGTATTTGCTATACCGAGTGGAATAACCATATTTTTGTTTATAATGTTCAA 
TACTTTGTAGGACAAAACCTTCTGTAACCTCAAAAAAATTAGCAAGTTCAT 

35 ACAAGTCATGAATGCCTTGCAAAAATGCTTTTACAATATCTTTAAGAGGGA 
TTAACTTTTCATAGGCTAACCTACGTGCATAACCTTCAAATTTTCTATGAT 
TAAAACTACTTTGATCAACTATATTTCCATATGTAAGTTCGTTATGAGCGA 
GTTCCTCTGCAAGTGTTTCAAGTTTGGAAGTAATAGGCAAGTTACGATTAA 
TTAAAATCATATCTCCAAGCCACAAACCAGATAACCTTTTAGGTAAGTTAT 

40 CACATTCAATGACTTCAATATAGTCATGTTCAATTAACATATCTTCATATT 
TCCCCACATAAAACACCCTTTATTTTCTTTTACTTCTTATGTATTCAGCGT 
AATCAAGAATTTCTTGCCATTCTTCGTCAGTTAAATCACCATCAAGATGCG 
" CTGCACGATGTTGTGGTTGTTCCTCAACTTCTTCTATCCAACCCATCAAGA 
AAGCAGGGTTAACATTTAATGCAGTAGCTATACTTTCAATAGTATCATTTT 

45 TAAGATITTTAATATTCCCGCTTTCATAACGTTGTACAGTAGCTTCAGTTT 
TACCGATTTTTCTTCCTAGTTCAGCTAAAGTCATACCTTGTTTTTCTCTTG 
ATTGTTTCATTCTTTTTGAAAAGCACATCGTAATACAGCTCCTTTTACTTT 
ATAACTATATTATAAGGAAAACTTTCGTGATTTGCAATATTTTTTCTTAAA 
AACTTTCGTAAAATGCTTGACCTATTTTGTACAGCCATGATAAGATTACTT 

50 ACGTAATACGAAAGGTGGTGAAATGAAATGCCTATCGACACTAAACTTTTA 
AAATCAAAAATGGCTTTAAAAGGACACAACATCAAAACTCTTTCTGAAGAA 
ATAGGTGTTAATAGAGATACATTGTCCAATATGATACACGGAAGAACTAAA 
CCTTCATATCCAGTAATAAATGGAATTTATTTCGCGTTAGATTTAACACCT 
CAAGAAGGAAGAGATATTTTTTTTAACAAAGACTTACGCAAAAAGAAAGTT 

55 TTAACTTAAAGGAGGAATTCAAATGACTGAAAACAAGGTATTATACAGGGT 
TTTTGTAAAAGAAACAAAGAATAGTGGATATACATTAGTTTTTGAAAGTTT 
AGAAGAAGTCGCAGCTAATATAAAAAATGCAGAAGAATTTGACGAAATTAT 
AATAAGCCCCTCAAAAAGAGAGGCTCAAGATTAACGATACTAAAGATAATC 
TTTTAAATGTTGTGAACTGCAAATTACATTAGACCATGCGGCTTCACCAGT 
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CAGCTTACCAACGAACAAACTATCATCGCTATCAATAAACATTTTTAATTC 
GTTTCTAATAGTTTTAGCATCATTAGTTGAATTAATAAACCAAACTGATTT 

ATTGATTTTAGCAACATTAGGATAATCCTCAATACGCTCTATTAATTTTGG 
ATAATCCTTTTGGTTATTTAAATCATACGCAATAATAAAACTTTTCATATT 
5 TTCACCTCCTTTTATAAGGAGTATAGCAGAAAGGAGCATAAACAATATGCA 
AGATTTACAAGTATTTAATTTTGAAGATTTACCAGTAAGAAAAATAGAAGT 
AGATGGAGAACCATATTTTTTAGGTAAAGACGTGGCAGAAATATTAGGTTA 
CACAAGATCTGATAATGCAATTAGAAATCATGTTGATGATGAAGATAAGCT 
GACGCACCAAGTTAGTGCATCAGGTCAA/^CGAAACATGGTAATCATCAA 

10 CGAATCTGGTTTATACAGCTTAATCTTTGACGCTGCTAAACAAAGTAAAAA 
CGAAAGTATTAGAAAGAAAGCTAAACGTTTTAAACGTTGGGTAACCGAAGA 
TGTTTTACCTTCCATTCGTAAAACAGGTACTTATCAAGTTCCTGATAATCC 
AATGGACGCATTGCAACTTATGTTCGACGCACAAAAACAAACCAAAGAAGA 
AATAGCAACTGTTAAAGCAGATGTTATTGATATCAAAGAAAATCAAAAGCT 

1 5 AGAT6C AGG AGAATACGGATTGATAACAAAAACAGTTCATC AACGCGTTGC 
TTATATCAGACAAATTCACGGACTACCTAATAATAAAGAAGTTAACAAACC 
TTTATATAGAGATATTAACAGTAACGTAAATACGATGGCTGGTATTAAAAC 
AAGAACACAATTAAAACAAAAACATTTCGATGACGTAATGAATATGATCAC 
AAATTGGTTTCCATCTCAATCAACAATGTATGTCATCAAACAATTAGAAAT 

20 GGACTTTGAAAACGAAGTATAAGGAGTGATAGCAATGGAATACATTGGATT 
TGCGGACGCTATCGAGTTTGTGAAAATAAGTGGAATTTCTAAAAACGATTT 
AGAAAAGCACGTTTATAGCAATAAAGAGTTCCAAGAGAAATGTATGTACAG 
ATTTGGCAAGAATCATAAACGCTACATCAAGATTAGACCGGCAATTGACTT 
TATAGAACAAAATTTAATGGTGTCAGAAACGTCACTTTAGAGGAGTTTTAC 

25 TGAAGAAAGAGGATATATCGAAAAACTTTTAGAAATGAAAGGAGAGAATGA 
TAAATGAAATATCTATTAAGTTACATGACGATGTTTATCGCAATGATCATC 
ACATTACTTTTAGGAGGTGGTTTCTTTACGGTAATAGCATTTTCAATGTTA 
ACACTTATCTTTAGCACATTCTTCTGGGAAAAGTGGCTTGAGATAACAAAA 
AAGACTGAAACTTGCGCCAACAAGTAACAGTCAGAATCTAATCAAAATATA 

30 CAACTTAATTTAATCAAAATATACGGAGGTAGTCAAGTTGAGACACAAATT 
ATTAAAAACGGCCAATGATTTAAATA 



Sequence 3 3 65 

35 step - 1002al0 . cons . ok 

TCTATTTCTTTAAAAATGTTTGTTTCTAAAGATTTTAGCGCACGTTTTTTA 
GTACGTACTTTTTCGTGTGGATGTATTTTACCATCTTCTTCAAAACGCAAG 
ACAGGTTTCATTTTCAATAATGTACCTACCCAAGCTTGAGCTCCAGTGATA 
CGACCACTTTTTTGTAAATTTTTTAAATCATCAACAATTAAGTATGCACCA 

40 ATATGTTGTCTTATTTCAGTTAGTTCATTAATAATATCATCAGGTTTATAT 
CCCTTTTGTACCAATTGAGCAGCGTAAATTGCAAAGCTACCTTCAATCATC 
GCAGCAAGACGGCTATCAAACGTATGTACTTGAATATCTTCAACCATTTCA 
CCAGCTTGTGTTGCTGAAGGATAGCTTCCGCTTATACCACTTGATAAGTTA 
ATCACGATGACATCAGTGTATCCTTGTTCTCTTAATCTCTCAAAATTTTCA 

45 ATCCAATCGCCAATAGCAGGTTGGCTTGTTGTTGGTATAGTTTTAGATGAA 
GCCATTTTTTTATAAAAATCATCTACAGAAAAATCATCACTCTCAGTGAAA 
TTCACTCCATCATCGAAAGTTACACTTAGTGAAGCGACTGGTATGTTATAT 
TGTTCTATTATATGTTGTGGTAAATAACTTGTAGAATCGGTCATAACTGCA 
ATCTTCATCTTAATACTTATCCTCCTCAATTATACTCATAGTTATGATACA 

50 ATAATTTTTTAAAAAAGAAAATATCATCGTAATAATTTAGAACCTATCTTG 
TTACGTGTATAATAATGTTAGAAAACTGAATTGTAGGTGGCAAAAATAAGA 
TGGATAAATCCATAATTACTATTAAACAAGCACATTCAATTGAAAATGTGA 
TAAGTAAATCACGCTTTATAGCATATATTAAGCCTGTTTCGACTGAAAATG 
AAGCAAAAGCTTTTATAGATGAAATTAAAACAAAACATAAAGATGCAACTC 

55 ATAATTGTTCAGCCTATACTGTCGGACCAGAGATGAATATTCAAAAGGCAA 
ACGACGATGGCGAACCAAGTGGAACAGCTGGCATCCCAATGCTTGAAATAC 
TGAAAAAACAAGAGATACACAATGTTTGTGTCGTCGTGACACGCTACTTCG 
GTGGTATCAAGTTAGGTGCAGGCGGTCTTATTAGAGCATATAGCGGCGCC6 
TGCGTGATGTGATATATGATATAGGTAGAGTCGAACTAAGAGAAGCTATTC 
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CAGTAACGX3TTACGTTAGATTATGATCAGACAGGTAAATTTGAATATGAAC 
TTGCCTCTACTACATTCTTATTAAGAGAACAATTTTATACCGATAAAGTAA 
GTTATCAAATTGACGTAGTAAAAAATGAATATGATGCTTTTATAGACTTTT 
TAAATCGAATTACTTCTGGAAATTATGATTTGAAACAAGAAGACCTTAAAC 
5 TATTACCTTTTGATATTGAAACCAATTAAATAAAAAACACTATGCAAGTTA 
AAGCAACGCGTGCTTATGACCTACATAGTGTTTTTTTATTATGAACGGTGA 
TGCTCATCATAATGATGTTGCTTACCATTTCCTTTTTTTGTAATTAAATTT 
AATATTGGTCGATAATTATCATCTATTAATCCAGTAAATTCAACGATCAAC 
TCAATCGTAAAGACAATGAGAATGAACATCATAAGTGCACCCAACGGTTGG 

1 0 GATAAATAGAGG AT AAC ACTAGATAAACTAAACATAATCGCTATTGAATAA 
ATAAGTAAAACGGTTTGTCTATGCGTATATCCTAAAGCAAGTAATTTATGA 
TGTAAATGTGACTTGTCCGCTTGCATTATATGTTGCCCTTTTTTCATTCGA 
CGAATCATTGCAAATAATGTATCAATAAATGGCACCGCTAATATAACTATA 
GGAAAGAATAATGCAATAAATGTGATATTCTTAAAGCCGAGTAAGGATAAG 

1 5 AAACCGATAATAAATCCTATCATTAATGCACCACTATC ACCTAGGAAAATT 
TTCGCTGGGTGAAAGTTATAGAATAAGAAACCAAGTAAAGACCCTAAAAGT 
ACACAGCAAATCATGATAATAAATATGTTCGCTTGTAAAATAGCGATGAAT 
CCAATAGTCATTAATGCCAATGCTGAGACGCCTGAGGCAAGTCCATCAAGT 
CCGTCGATAAGATTAATAGCATTGGTAATTGCTACAATCCATATTACTGTA 

20 ATAGGAATGCTGAATATGCCAAAATGAATCGTTGGACCAATTGGCAATGAA 
ATAAAGTCTATTGTAATTCCATAAAACGTAACAATTAAAGCTGCAACAATT 
TGACCTGCTAACTTTAAATAAGGTCTTAGATCGTAAATATCATCAATCAAT 
CCAACCATATACATTGTAATTGCACCTAATATAAGCGGTTTAACCTCACGT 
TCAATAGGGTGTCCGAGCCAAATCCCTATTAAGAAAGAAAATAAAATGACC 

25 GTTCCTCCCATCACTGAGATAGGTTTCGTATGTACTTTTCTGAAATTAGGA 
CGATCTACTAAATCTAATTTTTTTGATATTACAATAATAATGGGTGTAATT 
ATTAAACTGACTATCATAGTAAAAGCTATAAGTAATAGTGTATACATCAGT 
TCACCTTCATTGAATAAGTTGTACCTCACGCTACCACTGTTTTGTGATTTG 
CTTTCACTTTTAGTTCTTTCACCTGAAGATTTTTGACTATTTTCAACATCA 

30 TCTTTTCTTAAAACATCATTGCCTTTTCCGGATACTTCGCTTAAATCAACG 
CCATTTTTACGCGCATGTCGACGTGCAGATGGTGTTGCATTGATACGTTGC 
TGGCTATTATCTTGTGTATTTTCACTTTCACTAGAATTCGGATTTGATTGT 
TTTGGCGTTTCTTTCTCTTTAGCTTCATCTTTTTGTGAATTTTATTATCTT 
TAGCAACCTCTTTTAATTTCTTAGTATCTTCATGCTTAACATAATTATTTA 

35 ATTGCTTTTCACTTGAATGACACGCCCATGAATCAAAATAGAAAAAGTTTA 
CAAAAATATAACAAATTGCTAAAAAAATTATTATACCAATAACATAGAATA 
CTTTTTTCATTAAAGTTCACCTAGTTTTCAAAAAAATTTTTATAATTTATG 
ATAACTTAAAAGTGTATATTTTTTCTACATAATATATTTAATTTTCAAGGA 
GATGTAAAAAAGTTGAAAAATTTCGCAAAACTAATTTTGGTGGGCATTTTA 

40 GTATCGGGGTCAGGGATAGCGAGTGTACAAACAAATATAACTCACGCAAAA 
GAAAGTCACGATTCAACTCCTCAAAATATTAAATTAGTGGGAACGTATGAT 
ACTTCTCAAGTTGATTCCAAAACGATGAAACAATTTAAAGAAATAGAAAAA 
GAAGATAATAATTTCCACATAACTAAAC 

45 

Sequence 3366 

step . 1002al2 , cons . ok 

GGGAGTGGGCATTTATGATTCAAATAAAAGGAGCAGTCGATTTCCCTATTT 
CATTGGATAGTACGACTTGGATATTTGACGATAGAAAAGTTACTATTGATG 

50 ATTTAGAACGTGGGGTATTTGATGGTACTAGACCCATCAACTTTGATGATA 
ACAAGGAATGGAACCGTGCTATTTTAGAAGGACAAACAAATCCACCAACGC 
TTGATTCTGAAATTAAATATAAAAAACGATCGGTTTTAGATGAAACATTTG 
TGATAAATATGACACCATTTTTCAAAAACGCCGAACCTATGGAAAATGCTA 
GCACAATAAAACTTTCTAATAAAGATTATTCCATCAATGTACCCATGGACT 

55 TATTACCTTATTTGTTCTTTCAATTTTCTAGTAATGGCAAACGTTTATATA 
GTGATAATGCTGTTGATAGTTTCATATATATTCCTGAAGAAGGATACTCTC 
ATCAATTTAAGTATGTCACACATATAGAGGTGATTTAATATGAGAAAGGTT 
CAATGTATTATTTGCGATACTAAAGTTTTTATTGATGAACATACAGTCGAG 
GCTAAACGCTTAAAAAATAATCCTATACGCACTTTCATGTGCGATGATTGT 
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AAAAGTAGACTCGATACACCTAAACAAAGCAAACGAGATTAAAATGATATA 
TTGGAAATTTCAAACTATCTCACATAAATAAAGAGTGTGGGACTTAAAGTT 
CCAAGATAAATAAAAAAGACAATTTCTATTGAAATTATATAGAAATTGTCT 
TTTTTATTATTTTTTTGATTATTTTCAGCTCGTTGAGCTGTTACTTTTCTT 
.5 ATATTAAGTGCCATTAGCACAAATCCTAGTTCTCTTTTGACTTTATTTATC 
CCTCGAACGGACATTCGAGTAAAACCCAAAATAGCCTTCATAAATCCAAAA 
ACAGGCTCCACATCAATTTTTCTTGACTGTAGATGCTTTTTGTTTTTGGTT 
CTGAAAACTTTTTAATAATTTGGACTTTAAAATATTCCCAATTATAATTTT 
TCATTATTTTTTTATTTGCTTTTGAATCGATTAATTGTTATATAAGAAGGC 

1 0 TTTTGATTTTATGATAG ACAC ATC ATTCGAATATTATCATTAAGTATTTTT 
TCTATCTACGTCCTGAAAACACAGATTGGGTATAGGCATATAAAACTACTT 
TTAACATTATTTTTGGATGGTATGATATTGCATCACGATGATGTCTGAATT 
CAGTCTCTAGAATTGTTTCTACAATATCATTTACATGTCGTGAAATATCAT 
TTGTGGGGATAA6AACTGAAGTTTCCATTGGTAGAGTAAGTTGAGTCATGT 

1 5 TATAATCTTTATACATAAGGCACCTCGTTAATTTAGTTTAGTGATGTTTAT 
TAAATTATACGAAAGGGCCm?ATTTTTTTAAAGTATTTTAATGTAAAATTA 
CATATGAATACAAAGTATTTTGGCGAGACTCTTGAGGGAACAGGACAAGCT 
GAAGACTACAGGCTGAAGCTGTCCTCTAAGAAAGCGAGCCAACAATACGAA 
GTATTGTAAATAAAGAAGCCAGTAAATGAATTTATGAAAACTCATTTACTG 

20 GCTATTTTGCTAGGAATTATGTCTCAGGCTCTTTCCTACGATTAGTCAAGT 
ACTTATAATTAAAATAAGCATTACTTAACTAAACCGTTCATTTTAATTTTT 
TTGAATAAGGTGTTAAACTATATTTGATTGTACGTTTGGCTAATGTGGTGA 
CATTTGATAATCGTACAATTTATGATTCATTACAAGATAATGAATTGTTTT 
TAATAATCGATTTATACAAGCAATGATGGCAGTCTTATGAGGTTTCTCATT 

25 AGGCTGCTTTCTTAGTTTGTAGTAATAATCGACGACATGATTGTCATAATG 
ATGCTGCCCTCTTATTATATTCATAATCACCCAAAATAAAAGTTTTCTCGC 
TTTTTTATTACCACGCTTGTTGATGGTATCTCTACAGTGTGTATGACCTGA 
TTGATATCGTTTGATATCAATGCCAACAAAAGCATTGAGTTGTTTATTTGA 
TTTAAATCGCTTAATATCACCAATCTCCCCAATAATCATAGCTGTGCTTAG 

30 CTTACCAATACCAGGTATCGAATGAATATTTTCAAAATAATCGAGTTGTTG 
TGCTAATTGAATCATGGCATCATCTAATTGTTTGAGATGATGAATAGATTG 
TTTTAATTGTTGAATAAGTAAGCGTAATTTTTCGACTAGAAAGGAATGTCT 
ATCGACATTAGGATAGCTTTCTTGAGCAATCACCCTTAATTGAAGTGCATA 
TTTTGTAGCTTTATCCATTGACATTCCCTTATCTGTAGAATTGAATATATG 

35 TGTAATCAGTACCTCCTTGTCGATATCAAGAACCATGTCTGAATGAGTAAA 
GATTTCTGCGATGTTGAGTGCAATGATTGAATATCGACTACTAAACAATCT 
TTCTAAACCAGGGAATGTTTGATGGAGTAATTCAAGGATCTGAAATTTAAG 
TCGATTTTGTTCATTCTCGATTTCTAGATGAAAACGGACGCGTTCTCTTAA 
TTCAAAGAATATTAACTCATGTATAGGTAAGCTGTCTGTTTGTTTAAGC6T 

40 CGGTCCTAAACAAGCAAGCTTATGAGCATCTGCCTGATCAGTTTTCCATGA 
TCTTAGAGCGCTCGTTTTAAATTTGGCTTCTAACGGATTCATTTGAATATA 
GTTAATTTGATTTACACAACAAAATCGTTCCATACCTCTTGAATAGATACC 
TGTAGATTCAAAAATGAGTTGTGGGTGGTCTAAGTCATTCAAATACTTGAG 
TAAATAATTGTAACCATTTTTATTATTCTGGATGAAAAACTCTTTTTGGAA 

45 TTTTCCATTTTTATAATGTGCAACTACACTACTTCTTTTACTAATATCAAC 
ACCTAAGTAATCGATAAAAA/^ACCTCCTTTGAATAATTGAGAAGCTAAAAA 
CTTTACTTAACCTTTCTCATTTCATTTTCCTATACACGGTTTCAAGAACCC 
AACATACTACAAACGAATTTCAAAAGGCGAGAGTAAAGCTGACTTGTTTTT 
TATACGGATTTAAAATCCAAGAGTCTGGACAGTCTACTTCTCTCTATAACT 

50 ATAAAAAATAGCTATGAAAAAATCTATCGTCATAGATTTCTTCATAGCTAA 
TCTTAGTATGTTTGTAATATATTTAAATTATTGTTCGTCTTGCATTAATTG 
TTTAACTCTTTTTGCTTCTTTTTCACGGGCAGATTTATTAAGTATTTTCTT 
TCTTAAGCGAATACTTTCAGGAGTTACTTCCACCAATTCATCATCATTGAT 
AAATTGTAACGCTTCTTCTA 

55 

Sequence 3367 

step . 1002b02 . cons . ok 

CTAAAAAATAAATTGTATATAACTTACTTTGAAATTGATCGAATATATAGT 
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ATCTTTAATAAAATGCAGATATTGTGGCGTAGAATTTGAGAATCAAAAAAT 
GATTAATATAGTTTGAAGAAACTGAGCATAAGTACTAGAAAAATAGCCAGT 
AAATGAGTTTATTATGAATTCATTTACTGGCTTCTTTAAATAATGATTTAG 
ACAGTTTGCTTTTCTAGAACTACTTTGAAATGCGTTATTTTCTTTAAGAAT 
5 ATTAATATATAACTTGTGTTGCTAATGCTAACCTTGGTCAGAGTGTATTGT 
CATGCGATAGTTAAGATGAGATCTTTTAATTAAAACCTTGTTTTAAACATC 
GATGACAAGGTCTAATGTAGGAGGTGTAGAAATAATTTTCAGAATGATATA 
AATAAATCCATAAATGAAGATAAATCGAAACTATTTATACCGTAGGGGGGT 
ATATGTTATGGTATAAGTAACTTTACTATCACGTTTTCATTGGGAGGGGTT 

1 0 AATTTGAATAATAATGGTG AAG AGCATAATC ATCTU^TC AC ATG AATC AT 
TCCAATCACATGCATCATGATAACCATGCCTCACATCATCATAGTGGCCAT 
GCACATCATCATGGAAATTTTAAAGTTAAGTTTTTTGTTTCATTAATTTTT 
GCAATACCTATCATTCTCTTATCGCCAATGATGGGTGTTAACTTACCTTTT 
CAATTCACATTTCCAGGTTCTGAATGGGTAGTGTTAATATTAAGTACAATT 

1 5 TTATTCTTTTATGGTGGTAAACCGTTCTTGTCTGGTGGTAAAGATGAAATT 
GCTACAAAAAAACCAGGCATGATGACCTTAGTTGCTCTAGGTATTTCAGTA 
GCTTATATTTATAGCTTGTATGCTTTTTATATGAATAACTTTAGTAGTGCA 
ACTGGTCATACAATGGACTTTTTTTGGGAATTAGCAACCTTAATTTTAATT 
ATGCTATTAGGACATTGGATAGAAATGAATGCTGTCGGAAATGCTGGAGAT 

20 GCTTTAAAGAAAATGGCAGAACTATTACCTAATAGTGCTATTAAAGTTATG 
GATAATGGCCAACGCGAAGAAGTTAAAATATCAGACATCATGACTGATGAT 
ATCGTCGAAGTAAAAGCCGGAGAAAGCATTCCAACAGATGGTATTATCGTT 
CAAGGACAAACATCTATAGATGAATCCCTAGTCACTGGAGAATCTAAAAAA 
GTACAAAAAAATCAAAATGACAACGTCATCGGGGGTTCTATTAATGGGTCT 

25 GGAACAATACAAGTCAAGGTTACAGCTGTTGGAGAAGATGGATATCTTTCT 
CAAGTTATGGGACTTGTTAATCAAGCACAAAATGATAAATCTAGTGCTGAA 
TTGTTATCTGATAAAGTAGCGGGTTATTTATTCTACTTTGCTGTAAGTGTT 
GGCGTGATTTCTTTTATTGTCTGGATGCTCATTCAAAATGATGTTGATTTT 
GCATTAGAACGTCTTGTAACTGTGTTAGTCATTGCTTGTCCACATGCTTTA 

30 GGCTTGGCAATACCTTTAGTCACTGCACGTTCTACTTCAATTGGTGCACAT 
AATGGTTTAATTATTAAAAATAGAGAGTCTGTAGAAATAGCTCAACATATC 
GATTATGTAATGATGGATAAAACTGGTACTTTAACTGAGGGTAACTTTTCT 
GTGAATCATTATGAGAGCTTTAAAAATGATTTGAGTAATGATACAATATTA 
AGCCTTTTCGCCTCATTAGAAAGTCAATCTAATCACCCATTAGCTATAAGT 

35 ATTGTTGATTTTGCGAAAAGTAAAAATGTTTCATTTACTAATCCACAAGAC 
GTTAATAATATTCCAGGTGTCGGATTAGAAGGTCTAATTGATAATAAAACA 
TATAAAATAACAAATGTCTCTTATCTTGATAAACATAAACTTAATTATGAC 
GATGACTTGTTTACTAAATTAGCTCAACAAGGTAATTCAATCAGTTATTTA 
ATTGAGGATCAACAAGTCATTGGCATGATTGCTCAAGGAGATCAAATTAAA 

40 GAAAGCTCAAAACAAATGATAGCTGATTTACTATCAAGAAATATTACACCA 
GTCATGCTTACAGGTGACAATAATGAAGTGGCACACGCTGTCGCAAAAGAA 
TTAGGTATTAGTGATGTTCACGCACAACTCATGCCAGAAGATAAGGAAAGC 
ATTATAAAAGATTATCAAAGTGACGGTAATAAAGTCATGATGGTCGGAGAC 
GGTATCAACGATGCGCCGAGTCTTATAAGAGCCGATATTGGTATAGCAATT 

45 GGTGCAGGCACAGATGTTGCAGTGGATTCAGGTGATATCATACTTGTTAAA 
AGTAATCCATCAGATATCATTCATTTCTTGACTCTTTCAAATAATACTATG 
AGAAAAATGGTGCAAAACTTATGGTGGGGTGCAGGTTATAATATTGTTGCT 
GTACCTTTAGCAGCTGGCGCATTAGCTTTTATCGGGTTAATATTATCACCA 
GCTGTAGGAGCAATATTAATGTCTTTAAGTACAGTTATAGTAGCGATTAAT 

50 GCTTTTACATTAAAATTAAAATAAAAGAGGTAAACCTTATGTATAATAAAG 

tttttgcaattttaattataattttttccataataattattxk:gtctaatg 
atactttcgcagaaagtaagaatgatatgatgaatatgaaagaagataaga 
aaaatacaatggatatgacaaatatgaaacatcatgacgaaagaaagaaat 
taaattcttcacaaggaaaaaatgaaataatatttcctaaagttgcagagt 
55 caaaaaaagataacaatggttat7vaaaattatacattaaaagctcaggaag 

GAAAGACAGAGTTTTACAAAAATAATTTTTCTAATACTCTAGGCTACAATG 
GAAATTTACTTGGACCAACTTTAAAATTAAAAAAAGGAGATAAAGTTAAAA 
TTAAGTTAATA7VATAACTTAGATGAAAATACAACATTTCATTGGCATGGAT 
TAGAAGTAAATGGAAAAGTGGATGGAGGGCCTTCTCAGGTTATAAAACCAG 
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GAAAAGAAAAAACTATAAAATTTGAGGTTAATCAAGATTCTGCTACGTTAT 
GGTATCACCCCCACCCCTCTCCAAATACAGCTAAACAAGTTTATAATGGCT 
TATCAGGATTATTATATATAGAAGATAGTAAAAAGAATAATTATCCTAGTG 
ATTATGGAAAAAATGATTTGCCTATAATAATCCAAGATAAAACATTTGTAT 

5 CTAAAAAATTAAATTATTCAAAAACGAAAGACGAAGATGGCACTCAAGGTG 
ATACTGTTCTTGTGAACGGAATAGTAAACCCCAAACTGACAACAAAAGAAG 
AGAAAATACGTTTGAGACTTTTAAATGGTTCTAATGCTCGAGATTTAAATC 
TTAAGCTAAGTAATAATCAAAGTTTTGAGTATATTGCTTCAGATGGCGGTC 
AATTAAAAAACGCTAAAAAATTAAAAGAAATTAATTTAGCTCCTTCAGAAA 

1 0 G AAAAGAAATAGTAATAGATTTATCTAAAATGAAAGGCGAG AAAATC AGTC 
TGGTTGATAATGATAAAACTGTAATTTTACCGATTAGTAACAAAGAGAAAA 
GTTCTAACAAAGGTAATACACCAAAAGTAAGTAAAAAAATAAAATTAGAAG 
GTATGAATGATAATGTTACCATTAATGGTAATAAATTCGATCCTAAAAGAA 
TAGATTTTACACAAAAGTTAAACCAGAAAGAAGTATGGGAAATTGAAAACG 

1 5 TCAAAGATAAAATGGGTGGTATGAAACATCCTTTCCACATCCAT 



Sequence 3368 

step . 1002b04 . cons . ok 

20 TGATAAATTAGTGGTTAGCTATATTTTTTACTTTGCAACAGAACCGTTTTT 
TGTATATAGTCTTATAGGAATTGCTATATTTTTTGTGCCGGTAACGATTAA 
TAATAACTCTTCAATTTTATTAGATCATTTTGTCACATGGATAAGTAGCGC 
ATTACCTCTTTTAACTAAGATATTCATAATGATTATCATTATACTAGGTGC 
TATTTATCCATTTATTAAAGGGACATGGAATCGGAATACCGTTGAAACAAT 

25 TTTTAGTTTATTTAAAGTTTTGGGAGTCATTATAGGCGTTTTGTTAATTTT 
TAACATTGGGCCAAGTTGGTTACTTAATGAACAAACGGGAATGTATGTTTT 
TAACTATTTGGTAATTCCGGTAGGATTAACAGTACCTGCAGGAGGCGCGGT 
ATTAGCTTTATTAGTAGGATATGGCTTATTAGAATTTGTAGGTGTTTATGC 
GCAAAAAATTATGTACCCGATATGGAAAACGCCTGGACGTTCAGCAGTTAA 

30 TGCTTTAGCATCTTTTGTTGCTAGTTTTGCTGTGGGTTTACTTATAACGAA 
TAAAGAGTATAAAGAAGGTAAATTCACGGAAAAACAAGCTGTTATCATAGC 
AACCGGCTTTTCTACAGTTACTGTAGCTTTTATGATAGTTATTGCTAAAAC 
CTTACACTTAATGGATATATGGAATTTATATTTTTGGTCTACCTTGTTTGT 
TACTGCTGCAGTAACAGCTTGTACAGTTAGGATTTGGCCTATCAGTAAAAT 

35 TAGCAACACATATTATGATCAGCCATTTATAGAAGAAGATACAAGCGAATT 
AAAAGGTTAAAAAAAGCTACTTTTTGCATGGGAAAAAGCAATGGAAACTGT 
TGAATCATCACCTAACGTGATGAAAAATATTTGGTTG7VATTTCAAAGAAAG 
TCTGATTATGACTATGAATATCTTACCCACCATATTATCAATAGGTTTAAT 
TTGCTTGTTACTCGCAGAATATACAGTGATTTTCGATTATTTAGCATATGT 

40 TTTTTATCCATTAACTTGGATACTTCAAATACCAGATTCCTTTTTAACTGC 
AAAAGGCGCAGCTATTGGTATAACAGAAATGTTTTTGCCTTCATTAATTGT 
AGTCGAAGCACCATTAATCACTAAATTTATAATTGCTGTTACTTCTGTTTC 
TACAATTATATTCTTTTCAGCTAGTGTGCCTAGTATTCTCTCTACTGATAT 
ACCCATCCGCATAAGAGATTTAGTGGTTATATGGTTTGAGAGAACTGTATT 

45 GAGTTTAATTATAGTAACACCTATCGCATATATTTTTTTATAATTCAATAA 
TAAAATATCGTATATAAAAAAATCACCTTTTTTACAAATATTTATAGTGAT 
AAGGAAAGCAAGGATATGAAATTTCCATCTTAATTTACTGTTTTTTATTTT 
AAACTCAATTATATATTTAGGGTACACCCTAATCATACAAAATGAATTGTT 
GAAATTACAGCAATTAAAATGTATGATTAGGGTGTATATTAATTATGAGGT 

50 GAAACTAAATGATTGTGGGCTATGCGCGGGTATCATCAGTAGATCAAAATT 
TAGAAAGACAACTTGAAAATTTAAAAACATTTGGTGCTGAAAAAATTTTTA 
CAGAAAAACAATCAGGTCAATCTATAGAAAACAGACCCGTCTTTCAAGAAG 
CATTGAACTTTGTGAGAATGGGAGATCGTTTTGTCGTGGAATCAATTGATC 
GTTTAGGTCGTAATTATGATGAAGTCATTAATACCGTTAATTATCTTAAAG 

55 AGAAAGAAGTACAATTGATGATTACCAGTCTACCAATGATGAATGAAGTCA 
TTGGTAATCCATTATTAGATAAATTTATGAAAGATTTAATTATACAGATAT 
TAGCAATGGTTTCAGAACAAGAAAGAAATGAAAGTAAACGTCGACAAGCTC 
AAGGGATTCAAGTTGCGAAAGAGAAAGGCGTATATAAAGGACGACCTTTAT 
TATATTCTCCCAATGCGAAAGATCCTCAAAAACGTGTTATCTATCATCGTG 
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TTGTAGAAATGTTAGAGGAAGGCAAAGCGATTAGTAAAATTGCGAAAGAAG 
TGAATATTACAAGACAAACTGTTTATAGAATTAAACACGATAACGATGATT 
TGGGTTCTGTTGCAAAGTAAAAAAATATAGCTAACCACTAATTTATCATGT 
CAGTGTTCGCTTAACTTGCTAGCATGATGCTAATTTCGTGGCATGGCGAAA 
5 ATCCGTAGATCTGAAGAGACCTGCGGTTCTTTTTATATAGAGCGTAAATAC 
ATTCAATACCTTTTAACTTCGAAATGTTTTTTTATTTCTTCGGCAAAATAT 
TGGGGTGTTAAAATATTCGGAGGCATTTGACTTAGTTCACGAGCATAATTT 
ATAGATTCACCCAATTGTTGGCCTTCGTTAATCTTCGTTTCTATATTATGC 
TCTGTCGTTTGTATATAGACTCTACCTTGATATGGAGGCTTTTTATCCGAT 

10 TTATAATGGTCAAATTCAAAGATAGATTGTGCACTTTGTAAACCTAAAGTT 
TGACAAATTTCAACAATGTTACCATGTTTAGATTTGAAAGTATCTAACAAT 
AATTCACAATCCGTGACATGAACGTCTTTTAAATATTGAAACAACTGACCC 
CAAACTTTTAATAAACGTTGATATGATAAGATTTTTAAGTTTCCTAGTCCT 
ACAGTTAATAATCTTTTAGGCTTTTCATCAATGTAGATAAGCGTTGATGAT 

1 5 ATTTTGCCG ATTG AGCTATTGATGACATGATATTGTTTTAATGTTTCTAAT 
CGTTCTTTAATTAACGTCTTATGATATTTTATCTCACCTAATTGGTTGAGG 
TGATCTGGCACACCCACCACAATCGTTTCAATTGATGTTGGCGTATCTTGA 
TTAATAAAAAATTGCATATAGATCCTCCGTTCCTCATTCTGTATAACAATA 
ATTATAAGTTATGAACACCCCAATATTAAACAAAACCACTTTTTACTCTAA 

20 CTCTAAATAAAAACCACTTTTGTTTAATTAAGTGTGAAATTCTTCTTCATC 
ATATTTTTCTGATAAAAAATAACCTGGGACAATCATTTTGTCCCAGGTATT 
ATTAAAATATAAACAATCGATGACAAAATTAATTTGAGTTGTTTACATATT 
TTAACTCTAGTCATCTTTGCTAAGGGAGACAACA7VATTATGTCTCGTACCT 
TTTTATTGTTTTTTTATATCTATTACTTGATTAAAATTTTCCTTTTTTAAA 

25 GGCTAATCCTATGCCACCAAGTTTGTAAATCGCACGAGTATCAATAACTTT 
CTTCAAGAATGCTGCTTTTTTACCAGCGAT 



Sequence 3369 

30 step . 1002b05 . cons . ok 

CAATATGTTCGAGTAATGACTTGCGCGCATCGTCTTGCTGTTTCAAATATT 
CAAATAAAAAGCCAGTCCATACCCAAATTGTTTTTGTATTAAGAGCACTAC 
AAGGCGTATTTGCTGCATCATTTTCTCCT6TAGTCATGACTTATACAACTG 
AAACTTATCCACGTGTGAAACGTGTAACAGCAATTAGTTTCATCAGTACAA 

35 GTTTTATGTTATCAGGTGTGTTAGGGCAAAATATGAGTGAATTAGTCGTAA 
GTTATTTGAATTGGCAGTGGGTATATTTTATTTTAACAATCTTATATTTAA 
TTCTCGTATTAGTTATTTATAAAAATGTACCTGAGAGTCCACATAAAAATC 
CTGATATTCAACTCATTAAGTTTTTCAATAACTTTAAAGATTTCAAAGACA 
ATCTTAAAGTTTTCTATTGTTTATTTATTTCACT/^CATTACTGATTATGT 

40 TTATAAGTATGTATGATATTTTAAATGAATATGTCACATCACACCAAGTTG 
GGGGAGACATGTCTGTGTCCTCAATGATGAAATTGTTCGGTGTGATAGGCA 
TAGAAACTAAAGTAATCATTTTTATTTAAAAGTAGATTTTAATTTTTATAT 
ACATATTAGCGTTGTGGAATAGGTGGAAAGGGGGAGTCAAAAGGTGAAAAC 
GCTAATGATTAAAGCAATGGGGACGGTGATACGTTTATCGATTGAGCATCA 

45 ACATCCGGATACATTACTTCAAGAAGCTGAAATAAAAATTCGTGCTTGGGA 
ATCACAATTTAGTGCTAATGATCCGAAATCAGATTTGATGAATGTGAATCA 
GCATGCAGGTATCGCACCAGTCAAGGTTAGTTCTGAGATGTTTAACATGAT 
ACGTTTTGGTTACGAAACTACATTATCTTCTAATTTTAAGATGAACATTTT 
GATAGGGCCACTAGTCAAATTATGGAAAATTGGTTTTAAAGATGCATTGAA 

50 ACCTAAAGAAGAGGATATACAACGTGCTTTATTGTGTATGGATCCTGAAAA 
TCTTGTTCTAAATTCAAAAACACATGAAGTATTTCTTACACAATCAGGAAT 
GGAGATTGATTTAGGAGCTATAGTTAAAGGCTATTTTGCTGATCAATTACA 
GCAATACTTTTTAGCTCATGGTGTATCTTCTGGCATTATCGATTTAGGTGG 
TAATGTTTTAACAATTGGTAGACAACCCGAAACATTAGAAAAATGGCATGT 

55 AGGTGTACGTAATCCATTTCATAAGGATACACTACCACTCGTTACATTAAG 
CGTAGAGCATCAATCAGTTGTCACATCAGGTATCTACGAACGCTACTTCAT 
ACAGGAAAATCAATTATTTCATCATATATTGGATTCAACAACAGGTTATCC 
TGTAGATAATGATATCGCTAGCGTGACAATCATATCTGATCATGGGATTGA 
TGGCGAGGTATGGAGTACAATTTGTAGTTTTGGTCAGTCACAAAAAAATAT 
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TGAATTATTAAATCTCATTGACGGTATTGAAGGCATTATTGTGACAAGAGA 
TGGAAGCGTTTTAATGACTTCGAAAATGCAAAAGTATTTATAAAAATTTAA 
AGAATACTCATGAAGAGTATTTTATTTTTAAGTAATAATGTGAAAATATTC 
ACAAAAAAACTAGGAGGATTTGCTATGAGTAAGGAAATATTCGATACTTTT 
5 AAATTTAAATGTGGTGCCGAATTAAAAAATAGAGTATTAATGGCACCCATG 
ACTATCCAAGCTGGGTATTTTGATGGAAGTGTTACATCAGAAATGATTGAT 
TATTATCAATTTAGAGCTGGTGATGCTTCAGCAATCATTGTTGAAAGTTGT 
TTTGTTGAAAATCACGGACGAGGATTTCCGGGAGCTATAGGTATTGATAAT 
GATGACAAAATACCTGGACTCAAACGTTTAGCAGAAGCGATTCAAGCTAAG 

10 GGATCAAAAGCGATTTTGCAACTTTATCATGCCGGAAGAATGGCAAATCCT 
AAATTTAATGAAGGAGAGCAGCCGATATCTGCGAGCCCCATTGCAGCATTA 
AGACCTGATGCTGTACCACCTAGAGAAATGACACATGCTCAAATCAATCAG 
ATGATTGATGACTTTGGAGAGGCTACACGTCGCGCTATAGAAGCGGGGTTT 
GATGGTGTCGAAATTCATGGCGCCAACACATACTTATTACAACAATTTTTC 

15 TCTCCACATTCTAATCGGAGACAAGATTCATGGGGAGGCAGTCGTGAAAAA 
CGTACACGATTTCCAATCGAAGTTTTGACAAAGGTTCAACACGTCGTTGCT 
GAAAAAGAGGCTTCTCATTTTATTATAGGATATCGATTCTCACCTGAAGAA 
ATTGAAGAACCAGGCATACGTTTTGAAGATACCATGTTTTTACTAAATACA 
TTAGCAGAATATGAACCTGATTACTTCCATATATCAGCAAACAGTTATCAA 

20 CGTACATCTATTGTGAATCAAGAAGATACAGAACCTTTAATTAATAAGTAC 
ATCAAAATGCAAAGTGCACAGTTGGCAAAAATTCCATTAATTGGTGTAGGT 
AGTATTGCCCAACGACAAGATGCAGAACATGCCCTTGAACTAGGATATGAT 
CTTTTAAGTGTTGGGAAAGCCTATTTAGTGGAACCACAATGGACAGATAAA 
ATTTCACAAAACGAAGAAGTAGAACAATTTGTCGATATACATGATCAGAAA 

25 GTACTTCACATACCATCCCCTTTATGGAAAGTAATGGACTTTATGATTTTA 
GATAAAGAAGAAGAGCATCGTAAATATGAAAAATTAAAAGCACTTCAAAAT 
AAAAAAGTTAAATTTAACAAAGGTACGTATCATGTCTATGC/yUUVGGTCAT 
AATGGCAACTTACCTATGAAAGTCCAATTATCAGAAGATAAGATTGTAAGT 
ATCGAGGTAGATGATAGCGGAGAGTCTGAAGGCATAGCGAACCCAGTGTTT 

30 GAACGTTTACCTCAAGATATTATCAATGGGCAAACACTGAATGTAGATGTC 
ATTTCAGGTGCGACAGTAACAAGTGAAGGCATCGTGCAAGGTATTGCAGAT 
GCAATTGAACAAGCAGGAGAAGACCCAGATATTTTACGGGCGCGTCCTAAA 
CCAGTCGTTCAGTGGTCTGATGAGGTTGTTGAAGAGA 

35 

Sequence 3370 

step . 1002b07 . cons . ok 

GATAAGAATTCTTAATTCAACTTCGGCAATATTATAGTCTATAATATACAA 
GTCAATTTATTCAGTTGTTTCAATCAGACCGTATTTGCCATCTTTTCGTCT 

40 ATAAACAATGCTTGTGCCATCAGTTTCTCTATCGGTAAAAATAAAGAAATC 
ATGTCCCAATAAATTCATTTGCAATACTGCTTCTTCAGAATCCATTGGTTT 
TAAGCTGAATTGTTTAGAACGAATAATTTCAATATCATTTTCACTTTCTAT 
ATCATCTGCATGATTGTTTGTTGTAGACTCTTGTACTTCAGCAACAAAGAT 
ATCTTGATCTCCACGATCTCTATGTTTACGATTTACACGAGTTTTGTATTT 

45 ACGTACTTGTCTTTCCAATTTGTTAGTAATTAAATCAATGCCTGCATATAA 
ATCATCATGTCTTTCTTCAGCTCTAAGAGTGACATCTTTTAAAGGAATTGT 
AACTTCAATTTTAGTTGTAGAATTAGAATAAGTTTTAACTCTAACATGTGC 
TACAGCATTTGGCACATTGTTAAAGTATCTTTCTAATTTACCTACTTTCTC 
CTCAATATAGTTGCGAATTGCATCTGTGATAGTGAGGTTATCTCCATGAAT 

50 TTCAAATCTAATCATAGTAATCTCTCCTTAAACCTCTATATTGATACTTCT 
TACCTATATTTTACCATGTTTATGTTATCGTGCAAACGCAAACACTTTGAA 
TTTTCTGACTTTTTTATCGTACAATTTACACCCTGCGTGATGAATAGTTAA 
CCCTGTTGTATAAATATCATCTATGAGTAGTATTACTTTCCCTTCGATATT 
TATCTCTTCATCTTTTATATAAAATGGATTAGGGGCTTTTGAACGTTCAAT 

55 CTTTCCTAACTTTGACTGCTTAGGACGTATATGTGTACCTAATACATCTTG 
ATATGAGATGCCCATTTTATCTAAGACAGTGGTCACAGGATTAAATGTACG 
TTCAATGTCGCGTTCAATTGGAGAAGGAATGGGAACTATATAATCATATTG 
CGTTTGAGGTAAAACTAATTTTCTCGCCAATACTTCTGCCAAATAGAAGTC 
TCGCTTAATCTTATACTGGTGTATGATTTCTTTCATCATACCATCATAACT 
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ATAATCACAAAACAATTGTTCCATTAAATAAAATGTTTTACTTAAGAATTG 
ACAGTCATGACATTCGCCATCAGTAGAAATTTTGTTGTTAAGGCACTTAGG 

ACACCTACCATCTTTATGAATTTTGGAGTTAAGCCAT7VATGTTTCACATCG 
TTCACAAAATATCTGAGGTGGTTTAAATAAGTTATAAATATGAATGGTCTC 
5 ATGAATAGGTTGATTGCACTGGATACATTTACGCATCAATCCATCCCCTTT 
TAATTGCTAAACGATTCATTGAAATAATGTTTCTTTTAGCTAAAATCATCG 
ATAATGTAACACCTTCATGAAGAAATAAAACTAAGCCACTTGGAGACTGCT 
GTTTACGTCCTACGCGACCAGCAATTTGAATTAAAGCCTCTTGTTGAAAAC 
TTCCAGCATCAACTACAACGACATCTAAGTGTGTCATTGTAAATCCTCTTT 

1 0 CTAAAATAGTTGTAGTGAATAC AATTTTGTGTTGTCCTCGTCTTAAAGCTT 
CAATTTTTTCAAATCGTAAATCATCTTCACTGTGAACGCAAATCAAATCAG 
GGATGTCCATTTTATACTGTTGATACATTTTATTCATAATTTCTATATTAT 
TAATAAAGACCAAAGTAAAACGTTGTTGGTTAATTTGATATCTAAATATAT 
TAAGTAATAAATTTTGTTTTCGTGTTGATTTTAATTTGAAATATTTGAACT 

1 5 TAGGAATAGG AAGGGGGGATCGGTGAAAACGGGCTGGTAACTTAATTATTT 
TTTCTGGGGGGAATTGTTTTAAAAAATGACGCGGTGGTGTGGCCGTCATGA 
AAATATGTGAATGATTCGATTTTGAAGCAAGTTGTATTGCATTTGATAATT 
GTGGATCCATAGACAACGGAAAAGCATCTACCTCATCGACAAATACAGTAT 
CAAAATGCTGTTTAAACCTCAATAATTGATGGATAGTAGCAATAACAAAAT 

20 GACCATTATATTGCTGTCTACTAGATTGATGTAGCACATCTATATGTTCAT 
CGATAAAAGCATCTTTAATTCGATGACTTATCTCAATAATAACGTCTACAC 
GAGGTGATACAATAGCAATATTATGTCCCATCTGACGAGCTATCCTAATTC 
CCTCAAACATCATTTCAGTTTTCCCTGCTCCTGTGACGGCATATAACAAGA 
GGTCATTATTATTTTTTATAGCTTGAACGATAGTTTCTGAAGCGTATTGCT 

25 GTTGCTTTGATAATTCAAAGGGTAACTGATAATTCGCTTTTGTTGCTTTTT 
GAAAACTTTTTACGAGACATATATCTGTAATACTATCCATTCTGCCTAATT 
GAATGCATCGTCTACAGTATGTGATTGTTTTATCCAATATCGTTGAATTGT 
AAGTCACAAAATATTGCAAGGAAGTGTTTCCACATTGAATACAATACCAAC 
GCTCACTGTCTTTTATAACACCTTTTGAAACTTGAGATACTGATTCATCTT 

30 CAAGTTGATGCGTTGGAGAAATCAGTCTACCGTAATATTTAATTTTACTTC 
ACTACTTTCAAAAAAGTCTGAAATACTCAAATAAGCGAACACAAGTGATAT 
CTAATGATATTACCACTGTGTTCTTTCCTATCAAGCTTTCAGACTTTTGAT 
TATATATTAATTAATGTCGATTCTTCTTGGGAAGTAACCCAATCCTAAACC 
GCCTGATCCTAAATGTGATGCTATCACTGGTCCAAATTCACAATACTGAAT 

35 ATGAACATTAGGATGATCTTCCTTTAATTGCTGAAGAAATGACTTTCCATC 
TTCAGTTTTATCACCGTTTATTACAAATACTGTCACATCTTCCATGCCTTC 
TATTTCTTTAAAAATGTTTGTTTCTAAAGATTTTAGCGCACGTTTTTTAGT 
ACGTACTTTTTCGTGTGGATGTATTTTACCATCTTCTTCAAAACGCAAGAC 
AGGTTTCATTTTCAATAATGTACCTACCCAAGCTTGAGCTCCAGTGATACG 

40 ACCACTTTTTTGTAAATTTTTTAAATCATCAACAATTAAGTATGCACCAAT 
ATGTTGTCTTATTTCAGTTAGTTCATTAATAATATCATCAGGTTTATATCC 
CTTTTGTACCAATTGAGCAGCGTAAATTGCAAAGCTACCTTCAATCATCGC 
AGCAAGACGGCTATCAAACGTATGTACTTGAATATCTTCAACCATTTCACC 
AGCTTGTGTTGCTGAAGGATAGCTTCCGCTTAT 

45 

Sequence 3371 

step . 1002b08 . cons . ok 

ATGTACATGTTGGTGTGTTTGACGTTCGATATTTTGTAATCTTTGGTAAAC 
50 AAGCTTCCATGTTCCACATGCTTCTTCTGGGGTTTGATAATCAGAGGTTAG 
TTGATTCATAACATTTAATAATTGCCCAGTATCTTCGTGGTCATTAATTAA 
ATCTTCTAATATGATTTTAATGTTTTGTACATCTTGTCCTTGACTATATTG 
AATGAGTTTAGGAi^AATCTTCCTCATCTTCTTTACGTATATGATCCAACAT 
ACTATCACGAAACTCGCGATATAAGTCTTGTAATTTTAATAAGTATGGGTG 
55 ACTAGGACCATGTACTTTTGCCAATTTAGTCATGTAAGGTGTAAGATTCTT 
AAATTCTTCTTTAAGCGTTTCGTGATAAGCTGATTGTATATATTGTATAAG 
AGATTCAACATTT/JU\AATTTAGGATTAATGGTACTGTTACCTTCTGTATT 
ATCAATATGATTCAACTTATTTAATAAGGAATTTAAGTCAATATTTGGTTT 
ATGATTGACAGCTGAAGCGATACTCTCTTGTCCGCCACAACAAAATTCCAT 
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TACGAAGATGTTGAAAAACGTAGGGAAGAATTTCATCCTTCAGGAAGTCGT 
TCTAATTTAACCATTAAAGCAATGGGAAAAGAAATTCTTCCTTATATCGAT 
GCGACATTTCCAACTTATAAAGTAGGTAATACAAGGTTACTTATTGGAGAT 
AGTTTAGCAGG7UVGTATCGCTTTAATGACTGCAATGACTTACCCAACTATT 

5 TTTAGTCGAGTTGCGTTATTGAGCCCAATGTATAATGAAAATATTAAGAAA 
AAAATTGATACATGTATGAATAAAGGTCAATTGACGATATGGCATGCCATT 
GGTTTAGAAGAAGCAGArrTTATTTTACCAACTAATGGTAAAAGAGCTAAC 
TTTTTAACACCTAACCGTGAATTAAATCAACTGATTAAAGAAGATAATATT 
GAATATTTCTATAAAGAATTTAACGGTGGACATCATTGGAAATCATGGAAA 

10 CCATTGCTAGGAGATATTCTCTTACAATTTTTAG6TGATCCAATAAATGGA 
AAATATGTTTAATAAAGTAAGATAAATAAAATAGTATGTTAGCGAATGGTT 
GTTGAATGCGAAACACTATTAATTTCAGAAATGTAATTGTTTTCATGATAA 
AAGTAACGGTTTTCAAAAAGTTTTGATATAATAGGAATAAGTTAAACAAAG 
GAGGAATTTAAATGATTTTAGGATTAGCATTGGTTCCGTCAAAGTCATTTC 

1 5 AAGATGAGGTGAATGCTTATCGCAAGCGATATGAC AATCATTATGCTCAAA 
TAATGCCTCATATCACGATTAAACCTCAATTTGAAATCGATGATCATGATT 
TTAATTTAATTAAAAATGAAGTGAAAAATCGAATTTCTAGTATTAAACCAG 
TAGAAGTACATGCTACAAAGGCATCTAATTTCGCTCCAATCAGTAATGTTA 
TATACTTCAAAGTTGCTAAAACAGAGTCATTAGATCAATTATTTAATCAAT 

20 TTAATACAGAAGATTTTTACGGTACAGCTGAACATCCTTTTGTACCACATT 
TTACAATTGCCCAAGGTCTAACAAGTCAAGAATTTGAAGATATATATGGTC 
AAGTAAAATTAGCAGGGGTAGACCATAGAGAAATAATTGAAGAACTATCGT 
TACTTCAATATAGTGAAGAAGAGGACAAATGGACTATTATTGAAACTTTTA 
CATTAGGATAAAAAGTTCAAAATTGTAAAGTGAAATTGAATTTACAGAATC 

25 ATTATTGTTAAATACGTGAAGAGCGCTTAATCAACTAAAAACGCCAAATCC 
TATTGTGTTTCAGTGGGATTTGGCGTTTTTATATGAAACATAATTATATTG 
TATAGTTAATCATTCAAAAAGATTAATGTACGTTTTATATTGAATATTCAT 
ATTTCAAATGCATATGAATGACGTTTCATTAATATATACAACTAATCTTTA 
TATAAACGATTTATAGATTTTAAGGTTTAACCCTTCTGTTTGTTTTAATAA 

30 AGTAATATCCGTAAAATACTGCAAGTGCAAGAAAAATCATCGCTGAAAAGT 
AAAAGGTATTATTTAAATTATTAGTAAATTGAGTAATTAGACCTCCGACTA 
GTGGGCCTATCATTGAACCGAAGCCTTGAACACTGTTGAACACGCCCCATG 
TTTCTTCCTGTTCGTTAGGATTAATATGCCCAGCCATAAAGGTATTCCAAG 
CCGGTAAGAGGATACCGTACATTAGCCCAATAAAAAGTCCTATGGCCCAAA 

35 CTATATATATATTTGTAATTGTAGATAGCCCGAATATAAGAATTGTATATA 
GTATAAAGCCACTAAAAATAACTCCATACATAAACCCTTTGCTATTATTGT 
CGATGATTTTTGATAAAAATAACATAGAGAAAGCACAGCCTATGCCACCAA 
TAATGATTGCTACTGTATATTC7VACGGTTGATACTTTCACAACTTGCGTTG 
CATATTTTGGAAGAATAGGTACAAGTGCTGCTATAGCTGCTCCTTGTAACA 

40 AGATACCCGGAAATAGAATAAGATGACGTTGTGTAACATCTACAATTTGCT 
TTAATTGTGCTTTCACAGGTTTAGTATTGTAATTTGTTAAGTTGATATTAA 
CAAAATAGTATAGTACCCAGGCAATAAGCACAACCAAGGCCATTAAAAATG 
CAAAACGAGTAGGATGGAATTTAATAAGCAAGTTCATGATAACCATACCCA 
CCAATAAACCTAGCAACCATGAAAAGTAGACATAACCCATTTGTTTGCCGC 

45 GATTTCTTTCATCTACACTAGATAACATGATAACCCAAATCGGACTCACAG 
CAATTCCTAACATAATAGCACTGAAAATAATTATG/^TGGTGATGCCGGGA 
. ACC ATATC ACTAAAAAC AAGCTTATAAAAGC AAGTAAG 



50 Sequence 3372 

step. 1002b09 .cons .ok 

TTGCTAATAAATATGGCAAGGTTAAGGCGATTGCGAGTGAAATGGTTCCAT 
GTACACCACATAGAGTCATGATTAGTGCATATAAGCTACGCTTAGGTTTAC 
TCTCAGTAACTTTATCrrCATCATTTTTAGAAATCATTTTTTGGAATGGAC 
55 TGACAGATT^TAGAAATAAGGATAAAGTACATATACCCACACAAATCTAA 

AGAGATACACCGCAAGAGCGACTAGCAATGTGATGACGATTAAGAATAATA 
AATTATGCGGTTCAGTTTTAATAATTTTCACTATTACCTCAGGTACTAAAA 
AACCTAAGATTGAAAACACAAAACCATTTAAAACGTAGCCTAAAATACTCC 
ATGTGTGATTATAGCTCATCTGTAATTGTGTTCTCGTTTGAGCTATACGGT 
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CACGTTCAAATCCATGCACGAGTCCCGCAACTACTGCTGCAATAATCCCTG 
ATGCATGGAATAATTCAGCAATCAAATAAGTGACAAACGGCGTCAGCAATT 
GTATAAATGTAAACATGTTAATGTTCTCAATGCCACGTCTCATTAATGTTA 
AACGAAATCTCACTAAAGCCATGCCTATAATTAGACCAACAATAGCACCTC 
5 CAATAGAGGCAATCAAAAATTGTTGAATCGCATCAAAAATGGAAAAAGTTC 
CAGTAATCAATACACCTACGGCAATTTTAAATGAAATGATACCTGCAGCAT 
CATTAAGTAAAGATTCACCTTCAAGAATGGTCATTGAACCTTTTGGTAAAA 
CCTTACCTTTCGTGATTGCTTGTACCGCAACAGCGTCTGTAGGACAAAGTA 
TAGCCGCAATTGCGAAAGCAGCACCTATAGGTAATTCAGGCCAAATCCAAT 

10 GAATAAACAATCCTACGCCTATAACGGTAGTTATGACAAGACCTAGAGCCA 
TCATCATAACAGGTTTGATATACTTCCTTAAATGTACTCTTGAAACATTAA 
CACCCTCAACGAATAGTAGTGGTGCAATCAATGTAACCATAAATAATTCAG 
AATCAAAATTGAATTCAACTGGAATGGGAGTGAGGTATAAAATCATCCCAA 
GGAAAATTTGGATAAATGCAAGAGGTACTTTAGGTATGAAAGTATGTACAA 

15 AGGAGCTCACAATAACTAGTGCGACAAATATGAGTAGAGTTTCAAATATCT 
CCACACTTTCACCTCTTTATTAATTTTTAATTGCAAATTTAGTCATCAAAA 
CAATAAAAA6TTGCAAGGTTATCTACTGCAACACTGAAGAAGTGAAATAAT 
AACAATAGAGTCATTCCTGTAAGGACATAACTATTAACACTCAATCATGCC 
ACAAATATACTATTCAAAACTCAATTATTTTCTTTATCAGTGAAATCAAAC 

20 ATGAACATTTAAATTAACTTATGAAGCTCGCTTCTTCGAACGAGAAAATGC 
TTATGTGTAATTTCATTGAAAAAATTAGTGACAACTGTCATTCACAATTAT 
TCTTGTTTACAATTTTATCATTTATTAACCAGATTCTTAAATAAATAGCTT 
TTATATCATTAATATCTTCACCTCAGATATACAAAAAACTACTAGCATATA 
AAAGATATATCTTTTATATGCTAGTAGTTTAAAAGCTTAAACAATCAATCG 

25 TCCATTTTATTAATATATTCTTTTTCTTTATAGTATAATTTTTCTCGTCTT 
TCTTCTCGTAGTTTTGCACGTTTTTTTAATTCTTCTGGCGTATTAAACTCT 
TTTTGGCTTAATAAGATAGTAGATTGTCTAGATGCATCGTCTTTCCTGTAC 
ATATATGCACCATTACGGTAGGCCGCTCTACTTATCAAATGCATTCCAACC 
GGAGATGTAAGATTGATAAAAACTAGTGATAATAATAATCTGACACTGAAA 

30 AAACCTGAATTCACAATAAAATAGATCAGTACACCAACTACAGTTAGTAAT 
ACTGACAATGTAGAACTTTTCGTTGAGGCGTGACTTCTTAGAAAGACATCT 
TGAAATTTTACAATCCCTATTGCACTAATTAATGCAATAATACTTCCTAAG 
AAAATAAGTATOTTATTATCTAATTAAAAAACAATAGAATACTTAGTTTGA 
CTGTTCATGATTAAAATATTCTTTTAATTCTTCCCACTAAATACGAATAAA 

35 AGCATATTTTTACTGTTCTTACGCATTCTTCCGATTAGCTTTTCTGTATCG 
TTTTTCATTCAATAATAATTTTCTTAACATCTTCTAAGTAATCATCTTCAA 
TCATTGGCTTTAAAGTATTTCCATCAATAGCATTTAACATTACATCTTCTT 
TTTCCTTCAGTCTGTTATAAACTGCTTTATCGATAAAACCTTTATGGGCTC 
TATCCCCTTCAGACATCAAATAATAATACCTTGTATATTGATTATTATTTA 

40 ACCCTAAACGATGTATTCTATCACGTGATTGCAACATAAACGTTAAATTAA 
AGTTATATTCAAAATATATTGCATCATGTACTGTCTGATGTAAAGATATGG 
ACTCGCCTAATGTATTAGGATTAGATATTAGAACTTGTGCATTTCCATTCC 
TAAAATTATTGATCATATCTACCCTATCTTCTTTAGGTGTTTCTCCATAAA 
TCAATATTGAATTAATATCACTTTCTAGTAACCTCTTATTGATTTTATTCA 

45 TTGTACCTACAAACAATCCCCATACTAGAACTTTTTTTCCTTGGCTAACTA 
AGTTTTCAATCAATTCAATACCTTTTTCGAATTTTGGAGAAGTTACATTTT 
TTAAATCAAAAGAATTATAAATTTGTTGTTTTACTTGATTTTCTTCT^ 
TATTTAATGCTTTATCCAAATCGAAATTTAATTCATCATTCAACATTCCTA 
GTTCGCTATAATTAATATTCTTTTGCAATAATTCTGGGTTAGTTGAAGCTT 

50 GTAGTAACCTTATATAAATCGCTAGTATCCCAGATTCATTTTCGTATATCG 
CTTTTGCTAATTCAATTTGAATATTACTAGGCTTAACACATAATATAATAT 
CATTTTCTGCTTGAGGCACTTCCAAATCGTTCTTATTTGTACGCCAGAAAA 
AAGGGTTCAACTTGTCATTAATTTCATTAGGATCAGGATTTTGTAAATCAG 
CAACATTCCAACCAAAATAAGTATCATACTCATCTTTATATAAAATATTTA 

55 AGAAGTTAAATATGTCTTGATAACTATTTGGAATTGGGGTACCAGTTAAAA 
CGTATCTGTAATAACTTTTAGGACCTAAAGTTAATGCATAACTTGCTCTGC 
TACTATTAATACCTTTTATCCTATGAACCTCATCATAAACTATCATTGTCT 
TATCATTAATAAGTTCATTTAAAACCCCTACATACTTTTGTATTGCTTCAA 
AATTCAAAACAATGACATTTGCACTTCCCCAATCTGTTCGTACTTTACCTA 
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AATCATTATATTTTTTATCTCTTAGGTTCATAAAGTGTAATTCTCTTTTAT 
CTTCAAAAACTTCAATAAATTCTGAACGCCAAGCTTCAAATGCATTAATTG 
GAGAAATAACTATTAATTTATCAACTTTCCGCTTTATTTCACTA 

5 

Sequence 3 373 

step. 1002bll .cons .ok 

GCCTTCTTATAAAACGATTAATCGATTCAAAAACAAATAAAAAAAAATGAA 
GAATTATAGTTGGGAATATTTCAATGTCCAAATTAATCAAAAGCTTTCAGA 

1 0 ACGAAAAGCAAAAACCATCTACAGTCAAAGAAAAATTGATGTGGAGTCTGT 
TTTTGGAATTATGAAGCCTATTTTGAGTTTCACTCGAAAGTCCGTTCGAGG 
GATAAATAAAGACAAAAGAGAACTAGGACTTGTGCTAATGACACTTAATAT 
AAGAAAAGTACCAGCTCAACGAGCTGAAAATTATAAAAAATCATAAAAAAG 
ACAATTTTTATATTATTTCAATAGAAATTGTCTTTATTTACTTATCCTGGA 

1 5 ACTTTATGTTCCGGACTCTTTCCTACGATTAGTCAAGTACTTATAATTAAA 
ATAAGCATTACTTAACTAAACCGTTCATTTTAATTTTTTTGAATAAGGTGT 
TAAACTATATTTGATTGTACGTTTGGCTAATGTGGTGACATTTGATAATCG 
TACAATrrATAATTCATTACAAGATAATGAATTGTTTTTAATAATCGATTT 
ATACAAGCAATGATGGCAGTCTTATGAGGTTTCTCATTAGGCTGCTTTCTT 

20 AGTTTGTAGTAATAATCGACGACATGATTGTCATAATGATGCTGCCCTCTT 
ATTATATTCATAATCACCCAAAATAAAAGTTTTCTCGCTTTTTTATTACCA 
CGCTTGTTGATGGTATCTCTACAGTGTGTATGACCTGATTGATATCGTTTG 
ATATCAATGCCAACAAAAGCATTGAGTTGTTTATTTGATTTAAATCGCTTA 
ATATCACCAATCTTCCCAATAATCATAGCTGTGCTTAGCTTACCAATACCA 

25 GGTATCGAATGAATATTTTCAAAATAATCGAGTTGTTGTGCTAATTGAATC 
ATGGCATCATCTAATTGTTTGATATGATGAATAGATTGTTTTAATTGTTGA 
ATAAGTAAGCGTAATTTTTCGACTAGAAAGGAATGTCTATCGACATTAGGA 
TAGCTTTCTTGAGCAATCACCCTTAATTGAAGTGCATATTTTGTAGCTTTA 
TCCATTGACATTCCCTTATCTGTAGAATTGAATATATGTGTAATCAGTACC 

30 TCCTTGTCGATATCAAGAACCATGTCTGGATGAGTAAAGATTTCTGCGATG 
TTGAGTGCAATGATTGAATATCGACTACTAAATAATCTTTCTAAACCAGGG 
AATGTTTGATGGAGTAATTCAAGGATCTGAAATTTAAGTCGATTTTGTTCA 
TTCTCGATTTCTAGATGAAAACGGACGCGTTCTCTTAATTCAAAGAATATT 
AACTCATGTATAGGTAAGTTGTCTGTTTGTTTAAGCGTCGGTCCTAAACAA 

35 GCAAGCTTATGAGCATCTGCCTGATCAGTTTTCCATGATCTTAGAGCGCTC 
GTTTTAAATTTGGCTTCTAACGGATTCATTTGAATATAGTTAATTTGATTT 
ACACAACAAAATCGTTCCATACCTCTTGAATAGATACCTGTAGATTCAAAA 
ATGAGTTGTGGGTGGTCTAAGTCATTCAAATACTTGAGTAAATAGTTGTAA 
CCATTTTTATTATTTTGGATGAAAAACTCTTTTTGGAATTTTTCATTTTTA 

40 TAATGTGC7ACTACACTACTTCTTTTACTAATATCAACACCTAAGTAATCG 
ATAAAAAAACCTCCTTTGAATAATTGAGAAGCTAAAAACTTTACTTAACCT 
TTCTCATTTCATTTTCCTATACACGGTTTCAAGAACCCAACATACTACAAA 
CGAATTTCAAAAGGCGAGAGTAAAGCTGACTTGTTTTTTATACGGATTTAA 
AATCCAAGAGTCTGGACAGTCTACTTCTCTCTATAACTATAAAAAATAGCT 

45, ATGAAAAAATCTATCGTCATAGATTTCTTCATAGCTAATCTTAGTATGTTT 
TTATTTTATTGAAATTAATTGTTTATTAGCGATATAACTTGTCAATGATAG 
AGTCGTTATGTCATTATAAATGTAAGGGCTTACATATATGTTTTGCATCAA 
CATTTAGTCTGTTTCATACAATTCAATTTTTCGATTATGACGATATTGATT 
CATTTTATTGTTATATTAAAAAATAAGTTTGAGAGTCGATAGAAATCAGTT 

50 ATGATATTGTTAATAAGTTTTTATTCGAAAACGAAGGGGAGATTTTATGAC 
TCAATCTGAAAAAATTATTAACTTAACAAATCACTATGGGGCGCATAACTA 
CGTTCCACTTCCAATTGTTATTTCTGAAGCAGAGGGTGTATGGGTGAAAGA 
TCCGGAAGGTAACACGTATATGGATATGCTTTCGGCCTACTCGGCAGTGAA 
TCAAGGTCATCGACACCCAAGAATTATTCAAGCATTGAAAGATCAAGCAGA 

55 TAAAGTCACTTTAGTATCACGTGCTTTTCATAGTGATAATTTGGGTCAATG 
GTATGAGAAAATATGTAAACTCGCAGGTAAAGACAAAGCATTGCCTATGAA 
TACGGGAGCAGAGGCGGTTGAAACAGCTTTAAAAGCTGCTCGTCGTTGGGC 
TTATGATGTTAAGGGTATTGAGCCGAACAAAGCTGAAATTATCGCTTTTAA 
CATAAAAGCTATCAATAGCAATACAAATACATTGACGACCAAATTTCTCGC 
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TTGCTTGTCGAATTAATTCTGGATGTTTTAAAGCGCTTGAATTGAGTGATA 
TTTTATCTGCTCCGTGATTTAATAGTTGCGTAATATCATCTAAATTTTGAA 
TCCCTCCTCCTACTGTTAAAGGGATAAATAATTGTTTTGCCGTTGCTTCTA 
TCACTTCTATCATAAGATCATGTCCTGCTTCCGTCTTCGAAATATCAAGAA 
5 AGACTAGTTCATCTGCACCGGCTTCATTATAATAAAGAGCCAAATCAACT6 
GATTACCGATATCTCTTAATGACTGGAACTGGATACCCTTTACGACGCGTC 
CATCTTTAACATCTAAACATGGAATCACTCTTTTTTTAATCAAGATAATCC 
CTCCCAGAATTCATCCAGATGTGCTGCTTTTCCTACAATAGCAGCATGAAC 
ATTTAACGATTCTAATCGAAACAAGTCCTCTTGATGTCTAATACCTCCTGA 
1 0 AGCAATTACAGGCAACGATGTATAAAGTGCGAGACGACCTGTCCTTTAGAG 
CACAGTGGCGATGATATC 



Sequence 3374 

15 step . 1002bl2 . cons . ok 

TATTTAGCTGCTATCAAAGAACAAAATGGTGCAGCAATGAGTTTAGGACGT 
ACTTCAACATTCTTAGATATTTATGCTGAACGTGATTTACAAAATGGTGAC 
ATCACTGAACAAGAAGTTCAAGAAATCATTGACCACTTCATTATGAAATTG 
CGTATCGTTAAATTCGCGCGTACGCCTQAATATAATGAATTATTCTCTGGA 

20 GATCCAACTTGGGTAACTGAATCTATCGGTGGTGTAGGTATTGACGGACGT 
CCAATGGTAACTAAAAACTCATTCCGTTTCTTACACTCATTAGATAATTTA 
GGTCCAGCACCAGAACCAAACTTAACAGTGTTATGGTCTACTCGCTTACCT 
GAAAACTTCAAAATCTATTGTGCTAAAATGAGTATTAAAACGAGCTCAATC 
CAATATGAAAATGATGACTTAATGCGTGAAAGCTATGGCGATGATTATGGT 

25 ATCGCTTGCTGTGTATCTGCCATGAAGATTGGTAAACAAATGCAATTCTTC 
GGTGCACGTGCTAACTTAGCTAAAGCATTACTTTACGCTATCAATGGTGGT 
AAAGATGAAAAATCTGGTAAACAAGTTGGGCCAAGTTATGAAGGTATTAAA 
TCAGACGTACTAGATTATGATG7VAGTCTTCGAAAGATATGAAAAAATGATG 
GACTGGTTAGCTGGCGTATATATCAACTCATTAAATATCATTCACTATATG 

30 CATGATAAATATAGCTATGAACGTCTTGAAATGGCTTTACATGATACAGAA 
ATTATTCGCACAATGGCAACTGGTATTGCCGGATTGTCTGTAGCAGCTGAC 
TCTTTATCAGCGATTAAATATGCACAAGTTAAACCTATCCGTAACGAAGAA 
GGTCTTGTAACTGACTTTAAAATCGAAGGCGACTTCCCTAAATATGGTAAT 
AATGACAGTCGTGTTGATGAAATTGCAGTAGATTTAGTTGAACGTTTCATG 

35 ACTAAATTACGTAGCCATAAAACATACCGTAATTCTGAACACACAATGAGT 
GTATTAACAATTACTTCAAACGTTGTTTATGGTAAGAAAACTGGTAACACA 
CCAGATGGACGTAAAGCTGGCGAACCATTTGCACCTGGCGCAAACCCAATG 
CATGGTCGTGACCAAAAAGGTGCATTATCTTCACTAAGTTCAGTAGCTAAA 
ATACCTTACGATTGCTGTAAAGATGGTATCTCAAATACATTTAGTATCGTA 

40 CCGAAATCACTAGGTAAAGAAGAAGCAGATCAAAATAAAAACTTAACTAGT 
ATGTTAGATGGTTATGCAATGCAACATGGTCATCACCTCAACATTAACGTA 
TTTAATAGAGAAACATTAATTGATGCAATGGAACATCCAGAAGAGTATCCA 
CAATTAACGATTCGTGTATCTGGATACGCTGTAAACTTCATTAAATTAACA 
CGTGAACAACAATTAGATGTTATTTCACGTACATTCCACGAATCTATGTAA 

45 TAAATTTAAGGTGGGAGCAAAATGCTTAAAGGACACTTACACTCCGTTGAA 
AGTATGGGCACTGTCGACGGGCCAGGACTTAGATATATATTATTTACTCAA 
GGTTGTTTGTTAAGATGTTTATATTGTCATAATCCAGACACTTGGAAGATT 
AACGAACCATCAAGAGAAGTGACGGTTGATGAAATGGTAAATGAAATCTTA 
CCGTACAAACCTTACTTTGAAGCTTCAGGTGGTGGGGTAACAGTCAGTGGT 

50 GGCGAACCATTACTACAAATGCCTTTCTTGGAGCAATTATTCAAAGAATTA 
AAAGCGAATGGTGTTCACACATGCATTGATACTTCTGCGGGATGTGTGAAT 
GATACACCAGCATTTAATCGTCATTTTGATGAATTGCAAAAGCATACAGAT 
TTAATCTTATTAGATATTAAACATATTGATAATGATAAGCACATCAAATTA 
ACAGGCAAACCTAACACACATATTTTAAAGTTTGCACGTAAATTATCTGAT 

55 ATGAAACAACCTGTTTGGATTAGACATGTTrrAGTACCTGGTATTTCGGAT 
GATAAAGAAGATTTGATAAAACTAGGAGAATTTATTAATTCTTTAGATAAC 
GTTG/^AAAGTTTGAAATCTTACCATATCATCAACTCGGTGTGCATAAGTGG 
AAAAATTTAGGCATCCCTTATCAACTCGAAAATGTTGAACCATCTGACGAT 
GAAGCGGTTAAAGAAGCTTATCGCTATGTTAACTTTAATGGCAAAATACCC 
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GTAACATTATAGAAAGATAAATATAGTAATAGCCTGAGACATCAATCGTTG 
TCTCAGGCTTCTTTACGTTTAATCAGTCGATAAACTGAGTATAAAATATAT 
CTCAATTAACTTTATTTAATAAAAATATGATTCATAAGTTTAAATAGTTTA 
TTTTTCTAGATATGCTTTAAATACATCTTTATAAACATCGATGTAACTTAA 
5 ATACATATCTTTTCTAATAAATTCATCAACTTGATGCGCCATAATAGATTC 
ACCAGGTCCAAAGACAGCAAAATCCACGTTATTTTCATTTGTTCCTAGGAA 
ACTAGATGCATCAGTTGTACCAATCAATGCACTCACAACAATGTCTTCATG 
TACATAATTCGGTGCAATACGTGTAATATTTTGAATAAGAGGATTATCACG 
ATCACTTGCCACTGGATCGTGACTGCTAGGTATATCTACAGTTAAATAATC 

1 0 TTCGCCCACATGACGAATGACTTTTTCAAATAAATCCTTCACGAAAGTACT 
GTCATATTCTGGAACAGTTCTTACATTATATTTAGCTGTCGCTTTATGAGG 
AACAGAATTAACTTGTTTACCACCATTGAATACAGAGTTTAACATTACAAA 
TCCAGAGTAGATATGTGATTCTTCTTCACCAATTTTTCTGTGGAGATGTTT 
CTCAATCATTGGAACAGCGTCTAACTCGTGTACTTTATCATGTTCTTTAAT 

1 5 ATTTTTATATTCTTGTTTCATTTCATTTACAAAATCAACTAAAATATC AAC 
AGCATTTGTACCTAGGTGTGGCATGGAACTATGTGCAGCCTTACCTTTCGC 
TGTGACTACACAGCTCATAGAACCTTTATGAGCATAGTAAGCGATATTACT 
AGTTGGTTCGCCAATAATTAAACCACTCACATCATCTAAGTAACCTTCATC 
AGCTAGTAATTGTGCACCATATTGTTCCGTCTCTTCTCCTGTAGTAGCTAA 

20 TAAACGTATCGTACCTTGCTTTAATGCGTTTGATTGTTTTAATTCAATCAT 
CGCGATGACCATCGCCATAAGACCACCTTTCATGTCAGTGGTACCTCTGCC 
GAATAACTTGCCATCTTTATCTGTGAGTTCAAATGGAGGAAAAGTCCAATC 
ATCATGGTCACCAGCATCTACAACATCCATGTGTCCACTGATTGCTAAGAC 
TGGAGCGCCACTACCAATTTCAGCAACTAAATTAGCGCGTGAATCATTAAC 

25 TTTAACAATCTTAGAATCAATATCATATTGACTTAACA6GTCTTTTAAGTA 
CTCACATACTTTAATTTCATGATCATTCTCAGTTTGAATCTTGACGATATC 
AGCTAATAATCTTATTTTGTCTTGCTCACTCAATACAGTCATTTCACAGTC 
CTCCTCATTTTTAGTAATGATTTTCACTTTTTAATAAAAAATAAAAATCCT 
TGTACACTTAATATTATCAAAAACTTTTATTAATTAAAAATTATATGCTAA 

30 AATAAATGGATTTTTAATAAAAAATAAATATTAACTTAACATTTCTTATAA 
AAAACAACAATGTATAGAAATATTAAAAATTTTTCATTATTTATAGTTTGA 
ATCGTGTTATAAAATGAAAAGCTATGA 



35 Sequence 3375 

step. 1002c01 .cons .ok 

TCAACATTATAATATTGATTAATTAAATAACAATTGCAAGTATAATTTAAA 
TATTTCTCTAATATACAGTGTCAATTTATTTTATTCACATAAGAAAATAGC 
TATGAAGAAATCTATCAATTTAAATTTCTTCATAGCTAATTTTTTTCATTT 

40 AAATTTATTGACGGCTTGAAAAATGAGTCAAAATCATCAATAACATCAAAT 
TGCAAATATATTCCTTTGGTAATGGATTGACCATTAAACTTAATTCGAATT 
CTATTCTTTTCTATACAATGAAACGGGTGTCATACATCATCGGTAACTAAT 
TATGATAGATATGAACTTGTGGTTCTTTATCGTCTTTAGTTTTACTAATGA 
GAGCACGTGGAGTATTTCCATCTTTGATTCTAATTTCATACTCATCTAGTT 

45 TATCAAAATATTTTTCGGCTTGCTCTGTAACATATTGTGTAATACCTATCG 
TTTCTGCCTGTCCGTAATAATCTATAGGCAAATCAACTGTAAGTTGTTTAG 
CTTTTTTATTTACGAATTTAACCTTACCAACTGCTTGTGTGAAGTT^ 
AATACGATTGCAAATTATCATTAAACTGTTTAAAGTTATTATTCAACGTTT 
CATCATAATCAGCTGCAGTTGACGAAGGAATTAAGGCTGCTTTTTCATTAA 

50 TATTATCCCATGAGTTAATTTTAGTTTTACCCTCTTCAACCGTAGTACCAA 
CTATAAATTCACCTGGTGTAATGGAATCTTGACTTGATTGTTTATAGATAG 
CAAAATGAATAGGAATATCTTTCAAATCACTATTCTCACGTAAACGAGAAA 
GCATTTCACTAGCCATCTGTTTACCTTGCTTTTCAATCTCTTTATCAGATA 
AATCTTTACTAAATGTTTCGCCATCTTTCTCTTTTTTGTAATAATAAACAC 

55 TATTCATAGCTAAACCAATTGTCATCCCTTTTATATTTTTACCTTTAG^ 
CACTATTTCCATAAAAATCCTGCTCGAGTATATTTGAAAGATAGGCTGGAG 
AATTTTTAGCTATTTTCTCTTCATCTGTTTCACCATTGTGAGATGGATTGA 
GTCCAAGATTCTCATTAGCATTTTTGCTTTTCTTTTCTTTTTCGCTCATCT 
TGTCAATTTCTTTTTTCGTATACTTCGGATCTAAGTATGCATTAATCGTTT 
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TTTTATCTAAATATTGTCCATCTTGATATAAATACTTATTTGTTGG7VAAGA 
TTTCTTTACTTAATTCTAGTAAACCACTTTCAAAATCTTCTCCATTATAAC 
CATTTGCCATATTATCTTGTAATT^TCCACGAGCCTGGCTTTCTTTGAAGG 
GTAATATAGTCCTATAGTTATCACCTTGAACTTTTTTATCAGTCGCTATTT 
5 GTTTCACTTGATTTTTATTATGGTTATCCTTATGTTCACTTTGTTCTTTAT 
CTGATGAAGTTTGTTTATGTCCATCTCCGCAAGCCGTTAATAATAACAGTA 
TCGACATGAGTAAAAATATTGTTCGCTTCATTCACGTACTCCTCTAATTAT 
TAGATTCCATTTTGTTTTTCAATAAATGCTGCTTCAGTCCAAATTTCAGTA 
CCATACTTCTCAGCTTTGGCTAATTTAGACCCTGCATCTGCTCCAGCTATG 

10 ACAATATCAGTACTTTTAGTCACGCTGCTTGTAACTTTAGCACCTTGCATT 
TTCAACCATTCAGATGCTTCATTTCTCGTCATTTGCTCGAGTTTCCCTGTT 
AATACAATTGTTTTCCCACTAAAATCAGGATGACCTTCGATTTCAGTTGTT 
TTAATTCCTTTATAAGACATATTAACATTTTTATTACTTAATTTTTCAATT 
AATGAACGAATATCACTATTTTCGAGATATGTTACAACAGATTGTGCAAGT 

15 TTATCTCCAATATCTTGAATTTCAATTAATTCACTTTCAGTTACTTTAAAA 
AGTTGATCCATCGTTTCATATCGCTCAGCAAGTACTTGACTAGCTTTTACA 
CCTAAATGTCTAATACCAAGTCCAAATAATAAATGCTCTAATGACTGTTCT 
TTAGATTTTTCTATCGCTAATAAAAGATTATCAACTTTCTTCTTTCCCATT 
CGCTCTAATGGTAATAAATCTTCTTCTTTCAAATAGAAAATATCTGCGACA 

20 TCTTTGATTAACTGATTTTCGTATAGCTGATGAATAATTTTAGTACCTAAA 
CCATCTATATTCATCGCTTGTCTTGAAACGAAATGTATAAGTCCTTCAATA 
AGCTGTGCCTGACATTTTGGATTAATACAACGTAAAGCAACTTCTCCTTCA 
ATACGAACTAATTCATGTCCACAACTAGGACAATGTGTTGGCATATGATAA 
ATTTCCGATTCGTTAGGTCGTCTATCTAAAATACTTTTTACAACTTCAGGG 

25 ATGATGTCCCCGGCTTTTTTAATAACAACACTATCTCCGATACGTATATCT 
CTTTCATGTATTAAATCTTCATTATGAAGTGAGGCTCTTGAAACTGTAGTA 
CCAGCTACTTTTACAGGTTCTAGAATTGCAGTTGGTGTCACAACACCCGTA 
CX3CCCAATACTTAGCTCAATATCCAATAATTTTGTAATAACTTCTTCAGCT 
GGAAATTTATAAGCAATCGCCCATCTTGGAGATTTTTGCGTATAACCCATT 

30 TCCTCTTGTTGAGATAAATCGTTAACTTTTATAACAATACCATCAATATCG 
TAAGATAAAGATCCTCTTTTGCTTGTCCATTTCTCTATATAATTAAGTACG 
CCCTCAATATCTGATACTCGTTCACGTTCTTGGTTAGTTTTAAAACCTAAT 
TGGTCCAATTCCTCTAGCGCTTCACTTTGTGTTGTTGCATTAAACTCGGTT 
AGGTCATTCACACTATATAAGAAGACGCTTAACTTTCTTTTCGCAGCTAGT 

35 TTAGAGTCAAGTTGTCTTAAAGAGCCTGCAGCAGCGTTTCGTGGATTTGCA 
AAAGGTTGTTCACCATTTTGTTCTTTTTCATTATTCAAATGAATGAATGAA 
CGACGTGGCATATAAGCTTCCCCACGGACTGAAGGATATCAGGTTACGACA 
ACAACATGTGGTAAAGAAGCACTAAAATTATTATCTTCAGATATAGATATT 
ATGATTTTAGACATTATGATGCCGGAAGTTAGCGG 

40 

Sequence 337 6 

step . 1002c03 . cons . ok 

GGTTGTAAAAGATGTAAAAGGAATAGACGTTAGTGGCCCATTCCCACGTAT 
45 GACATATGCAGAGGCTATGGACCGTTTTG6TTCAGATAAACCTGACACTCG 
TTTCGGTATGGAACTTATCAATGTGTCACAGCTTGGTAAAGAAATGAATTT 
TAAAGTTTTTAAAGATACGGTAGATAACAACGGCGAAATTAAAGCAATTGT 
CGCAAAAGACGCTGCAAATAAATATACACGTAAAGACATGGATGCATTAAC 
AGAGTTTGTAAATATATATGGTGCAAAAGGATTAGCTTGGGTTAAAGTTGT 
50 TGATGATGGTTTAAGTGGCCCAATTGCTAGATTTTTCGAAGATGTTAATGT 
TGAAACACTTAAACAGTTAACAGAAGCTAAACCTGGAGATTTAGTAATGTT 
TGTAGCTGATAAACCTAATGTTGTTGCTCAAAGTTTAGGGGCTTTAAGAAT 
TAAATTAGCAAAAGAATTAGGTTTAATTGATGAATCAAAATTAAATTTCTT 
ATGGGTAACTGATTGGCCGTTATTAGAGTATGATGAAGATGCAAAACGTTA 
55 TGTAGCAGCACATCATCCATTTACTTCACCTAAAAGAGAAGATATCGAAAA 
GCTAGACACTGAACCTGAAAATGTACAAGCCAACGCTTATGATATTGTTCT 
AAATGGTTATGAACTTGGTGGTGGTTCTATAAGAATACACGATGGTGAATT 
GCAACAAAAAATGTTTGAAGTATTAGGATTTACTAATGAACAAGCTCAAGA 
ACAATTTGGTTTCTTATTAGATGCTTTTAAATACGGTGCTCCACCTCATGG 
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TGGCATCGCGTTAGGTTTAGATAGACTTGTGATGTTATTAACAAATAGAAC 
AAACTTGAGAGATACAATTGCATTCCCTAAAACAGCATCAGCTACATGTCT 
TTTAACTGACGCTCCAGGAGAAGTATCTGATAAACAACTCCAAGAACTCTC 
ACTAAGAATCAGACACTAGGGCCCCAACAAAGAGAATTTCACAAAGAAATT 
5 CAACAGGCAGAGCAAGTTGGGACTAGGGCCTTAATTTAATGTTTCTTAAGA 
AATTAGCACAGAGAGCTTAATTGTACGAATTGAATATAAAACATCGGTGTG 
TAATATAATT/IATCAGACATTATATTATAATTTTGGTAACTAGTTTAGAAT 
ACACGAGAAAATAATACACATTTGAGTTAAATGATAAGTCTTTAAAATTTA 
CTTCTCTTTTTAAATTGTGGTATGATATTTTCATAAGATAGTCATCTGTGG 

1 0 TGTTCGTAAGTTTGCTTTTTATTTGGGCCTAACACTCTTTGATCAGGGAGC 
CCAATAGGTTTTCTCGCAGCGCACACGCCTCTATAGGAGGACTTGCAAAAC 
GAGAAACAGGGCACCCACCTGTATATAGCAGGCCGAATGATCAAGCTATTT 
ATAACTACGGCATCAACGGACTCTATCGGTACGCAAGACTTTTGTCTTGCG 
TATTTTTATGTATATAATTATAAAGAAGCATAGATATAATGAATTGGAATC 

1 5 AAAGG AGCTAAAG ATGAAAC ATCAATTTTC AAGG AATGAATTAGC AATAGG 
ACAAGAAGGGCTGJ^ACTTACTAAAAAATAAGACTGTTGCAGTTTTAGGTGT 
TGGTGGCGTCGGGTCATTTGCAGCTGAGGCATTGGCTCGGACTAATATAGG 
GCACATCATACTTATAGATAAAGATGATGTCGATATTACAAATGTGAACAG 
GCAAATTCATGCACTGACTTCAACTATTGGTCAAAGTAAAGTCACGCTAAT 

20 GGAAGAAAGAATCAAATTAATAAATCCCGATTGTAAAGTAACTTCTTTGCA 
TATGTTTTATACCGAGGAAACATACAAAGATATCTTCAATAATTATGATAT 
TGATTATTTTATTGATGCAAGCGATACAATCATTTATAAAGTTCATCTCAT 
GAAAGAGTGTTTAGAAAGAGGAATTGAGTTAATTTCAAGTATGGGTGCAGC 
AAATAAGACTGACCCGACACGTTTTGAAATTGCAGATATTTCAAAAACACA 

25 TACTGATCCTATGGCTAAAGTAATTAGAAATCGTTTAAAACGCCTTGGTAT 
TCGTAAAGGTGTTAAAGTAGTATTTTCTGATGAAAGTCCTATTGTTATTCG 
CGAGGACGTAAAAGAAACAGTAGGAGATAAAAATGCAATCAATAGAAAAGG 
GCAAATGCCTCCATCTTCTAATGCATTTGTTCCAAGTGTAGTAGGCCTTAT 
TTGTGCAAGCTACGTTGTCAACGATATTTTAAAAGATATACCTGTAAGGCG 

30 AATTAAAGATAAAGGACAAAATTAAAGTTATCTATCTCATTGCAAGAAAAT 
CATCATCTTCTATTTTTGTGAAATTTGATTCATAGTTTAAAAAAAACAAAA 
TGTATTTTAAATAAATCTACAACAACTCATAGTAAAAAAACCTGAGGCATC 
AGTATTGTCCCAGGTTTTTTGCATCTAAGATATTAATTTCAACTACATATT 
TTCAATGAATAAAATTATGTTTTATTTCGTAAGTTATCGTAAATTGTTTTA 

35 AATTGTTGTTCACTTTTTGATGTGGTTTTTGGTTCATAATACACTCTGTTT 
TTTAATTTTTCTGGTAAGTACTGTTGTACAACATGACCATTTTCATAGTTA 
TGAGGATATTTGTAACCTATAGCCCGTCCTAACTTTTTAGCACCTGAATAA 
TGACCGTCTTTCAGATAATCCGGTATTTGACCTATATGTCCATTTCTTATA 
TCGCCAAGAGCTTTATCTATTGCTGTGATACCAGAATTTGATTTAGGTGAT 

40 AAGCATAATTCTATTACAGCTTGACTAAGGGGAATTCGTGCTTCTGGAAAA 
CCTAGACGTTCAGCTGATTGTATTGCTGCTAAAGTCCTCTGACCAGCATTA 
GGTGATGCTAACCCAACATCTTCATAACTAATTACAAGTAATCGACGTACG 
ATTGTAGGTAAGTCACCAGCTTCAATTAGACGTGCTAAGTAGTGGAGTGCT 
GCATTGACATCACTACCTCTAATTGATTTTTGAAAAGCGCTCATGACGTCA 

45 TAATGCATGTCTCCATCTTTATCACTTACAAAAGCACCTTTTTGTAAGCAA 
TCTTTTGCGTCATCTAATGTAATATGTCTTTCGTTTTCTTCACCAATATGA 
GCACTTAGTACAGCTAATTCCAAAGCGTTTAATGCACTCCTAACGTCGCCT 
TGGCTTTGCGTTGAAAAATATTCGATAGCATCTTCATCTACAATAGGATGA 
TAAGTACTTAAACCTCTTTCCTTATCATTTATAGCTCTATCTAATGCCAAT 

50 CTTATATCATCTTGATCTAATGGGTAAAGCTCAAATATTTGTGCTCTAGAT 
CGTATCGCTGGATTTATCGCATGATAAGGATTTGAAGTTGTTGCACCTATT 
AATACAATTTTGCCATTTTCAAGATGTGGTAATAAAAAATCTTGTTTAGCT 
TTGTCTAAACGA 

55 

Sequence 3377 

step. 1002c04 .cons .ok 

ACATTATTGCGTTTAATTGCAGGTATTGAGAATGCAGACGAAGGACGTATT 
CAATACTTCAATCAATATTTGTCAAGACGTCGAATACGTCATATTGTAGGC 
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TATGTCCCTCAAGACATCGCACTATTCGAGCATATGACTGTCATGGAGAAC 
ATTGAGTTTTTCAAGTCACTTTGTGAAAATCCTATTTCAGATGAAACACTT 
CATTCTTATTTATCACAATTAAATTTTACTGATACAAAAGTGAAAGTATCT 
AACCTTTCTGGGGGAAATAAACGTAAAGTCAATATTATGATAGGTCTACTT 
5 AGTCGGCCTAAAATACTTATTCTAGATGAGCCAACAGAAGGCATTGATTTA 
GAATCAAGATATGATATTCACAACTTATTACAACAAATGACCGATCAATGT 
TTAATCATCATGACGACACATCATTTAGACGAAGTTGAAGCACTAGCAGAT 
GATATTAAAGTTATAGGTCAAAATCCTTTTTATCATGATATTTTAGAAAAT 
AAAGGTTGGTCTTTTAAAAAATATGCAAATGCCTTAGCTGATAATACGAAA 

1 0 TCTTAAGCTGTAC TTTC ACTCCATCACTTCATCAATATTTAAC ATCAAAAA 
GGCTTGCCCTTTCGTCGTATATTTCTCGATTGAAAAGGCAAGCCTTAATAA 
ATTTTATACACATTTATTTATCGTGTCTAACTTATCTGTACACTAATCCAC 
CATCAGTTAAAATGGATTGACCTGTAATATAATCAGAATCATTTGAAGCTA 
AGAATGATACTAAGTTCGCAACATCTGATGGCTCTTGATATCTTCCAAGTT 

1 5 TGATTTCTGAAGAAAATGCTTCAAACGCGTCTCCTATTTCTAGACTATCAT 
CAAGTTTCACCATTTCTTCATCAATACGATCCCACATTTCAGTTTTGGC/^A 
CTCCTGGGCAATAAGCGTTTACGGTAATGCCTTTGTCTGCTAATTCTTTAG 
CAGCTGTTTGAGTGAAGGAACGTACAGAGTGTTTTGTAGCTGAGTAAGTGC 
CGAGTACTTCATAAGATTCATGTCCTGCAATACTACATGCGTTGATAATTT 

20 TACCTTTACTTTTTTGTTTAATAAATTGGTTAGCAGCTGCTTGAATTCCAA 
ACAATGTACCAAATACGTTAATATTAAATAACTTAGATAATTCCTCTTCAC 
CAATTTCTAAAATTGGCGTCACTGCATCAACACCTGCGTTATTCACCATGA 
CATCTAATTGACCAAATTCTGTAACTGCGAACTGAACTAATTCTTCCTGTT 
CTTTCTTTTTAGATACATCACTCTTGAATGCGACAGCTTGATAACCTTTTT 

25 CTTTGAATTCTTTCTCAGTTTCTAATAGAAGTGCTTCATTAATATCTTGTA 
GTACTATATTAAACCCATCGTTAGCAAGACGTTCTGCAATACCTTTACCTA 
ATCCACCTGCTGAACCTGTTATAATTGCTGTTTTACTCATATCTATCTTCC 
TCCTTATTTATCATCTACCCTTTAAACATGGTGGTAGTAATAAGATTGGGA 
AATGTCAAATGTTGTTAATATATTTGTCGAAAAGTTGTTGTTTTTATGCAT 

30 TATTAAATACAACTTCAAATTACTAACACTGCTTTTTACTGCAACAATATA 

ATGCGATAAACACTAAACCTGCACCCACGATAAATGCGATAGTAGCTGCTA 
AATATGCCGGGTGATTTACAGATAATGCCGTATATATTGTTGTCACAACAG 
CGATTCCAAATGCTGCACCTAGTGTAGAAGTCATTTTGATAATTCCTGATG 

35 CAGTACCTGCTTTTTCAGCTGGAACATTAGATACAGCTGTAGATAGCGCGG 
GTGTAGCAAAAAATCCTAATCCTAGACCTATAAAGATAAAGCCAATACATG 
CCACAATATAATAAATCATATTAGGCAAAGAAGTGAATGCTAAAAGTATAA 
TACCAACAGTAATTGATACCGGACCTAACATCAATGGCAATTGCGGTCCTC 
TTTTTTGCATAAAACGTTCACCAACACG/^TCATTAACAAACTACACAGCA 

40 TATATGGAATTGTAATTAAACCGGCTTGTGCAGCTGATAGATGTTTATCGT 
CTTGAACATAAATATTAAATAATGCTAATGAACCAATATCCATGTTCACCA 
TCAAGTTGGCTAATGTTGTTCCAATATAAACATTATTTGAAAATAAACTTA 
AATCTATAAAAGGTTCATCTTGACGTTTTTGAAATATATAAAATGCAATTA 
ATGTCACAATGAATATAGCAATTAATATCAATATAAGAGGATTTAACCATC 

45 CAATTCTATCTCCTTGCGTAATCACTACGTTAATGCTTAACATCATGACAA 
CAAAAATAATGATTCCAACGATATCAAATTTTTTATTATATGCAGACTCAT 
CTTTAGATTCTGGTATACCTCTAAGCAATATGAGTGAATTAAGTTTAAATA 
ATATAATAAGAGAATTTAACAAGCGTTTTGAAGAAAAATATAGTCAAAGTT 
ATGAATATAAAAAATTAGATTTTTCTGCTACTACAACCATTTATGATTCAG 

50 AAATATCAGAGTTTAAAGATTGGGTAGATGCAAATTATTTAGGTACTAACG 
TTGAA/U^TAACATTCAAACTGAAAAAAGATTTTTATATGAAAGACCACCAG 
TTAGATATGATAGTGTAACACCTGAGTTAGAGTTGTTAAAAAGAAATTACG 
ATAAAAATGTAACTGTATTTGGTAATTTGCCTAAAAAAGCGATACAAGTTC 
CTAAATATACTGGTGGCACTACTACGCCTGATTTTGTCTATATGATAGAAA 

55 CTGATGAACAAGATGCAAAATACCTTATTGTTGAAACAAAAGCAGAAAACA 
TGAGACTAGGAGATAAAAGTATTGGTGAAATACAAAAAAAATTCTTTAACA 
CATTAGATAATTTGAATATTAAATATCAATTAGCTACTAGCGCGCAAGATG 
TTTATAATGAAATTAAAAAATTAGATGATTCAAAGTGAGGGAATATGATGG 
GGAAATCAGAAAAAATTTCATTACTTGAAAAAGTCCAAGATGGTTTAGTAG 
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ATAAAACCGGAGCAAAACCTAAGTTACCAACAACTATTACCGATTTAAATC 
AAGAAACTTTAGAGGTATATAGAATACCACTAAAGTTTCTGTATTATAATG 
ATAGAAATGGAAGAATTGCTTCTGTAATATCCAGAGTAAGCGACGATATAA 
AAGTTGCTTATGAATTTGAAGATAATAATTATAATAAAAGTATTGAAAATA 
5 TGATATATGAGGCTAATACTTCTGCTTTA/^AAAACACTAAAAAATCTATTA 
AAGATAAA 



Sequence 3378 

10 step . 1002c05 . cons . ok 

AGATATCTCACCATTAGGTGCTGCAGCTTTAAGTGGAACTACACACCCTAT 
AGATAGACATCTGACTCAAGAACTTCTAGGATTTGCAAATCTGTATGAAAA 
TAGTTTAGATGCTGTCAGTGATCGTGACTATATCGTTGAAACATTGCATCA 
TATTTCACTCACTATGGTACATCTATCACGTTTTGCAGAAGAAATCATTTT 

1 5 TTGGTCGACTGATGAGGCAAAATTTATCACTTTATC AGATGCGTTTTCTAC 
TGGCTCATCTATTATGCCACAAAAGAAAAACCCTGATATGGCTGAACTAAT 
AAGAGGAAAAGTCGGACGTACTACAGGACACTTGATGAGTATGTTAGTAAC 
ACTTAAAGGCTTACCCTTAGCTTATAATAAAGACATGCAAGAAGATAAAGA 
AGGTTTATTTGATGCTGTACACACACTTAAAGGCTCTCTTCGAATCTTCGA 

20 AGGTATGGTTGCATCTATGAAAGTTAATTCAAACCGTTTAAGTCAAACAGT 
AAAAAATGATTTTTCAAATGCAACAGAATTAGCAGACTATTTAGTCAGTAA 
AAGTGTACCTTTTAGAACCGCTCATGAAATCGTTGGTAAAATCGTATTAAA 
TTGTATTCATAAAGGTATATACCTATTAGACGTACCTTTAAGCGAATATCA 
AGAACATCATGAGAATATTGAGGAAGATATATATGATTATTTAACACCTGA 

25 AAATTGTCTCAAGCGTCGCCAAAGCTATGGTTCAACTGGTCAAGAATCAGT 
AAAACATCAACTAAAAGTCGCAAAAGCATTATTAAAAGACAACGAATCAAA 
ATAGTTATTAAAAATAATAGCCATCTAAAATTACAAAAATTTAATTTTGTA 
TTAAGTTATCAAAGTTAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTG 
CTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTG 

30 AT^CTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGAGCATGTGGTTT 
AATTCGAAGCAACGCGAAGAACCTTACCAAATCTTGACATCCTCTGACCCC 
TCTAGAGATAGAGTTTTCCCCTTCGGGGGACAGAGTGACAGGTGGTGCATG 
GTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCG 
CAACCCTTAAGCTTAGTTGCCATCATTAAGTTGGGCACTCTAAGTTGACTG 

35 CCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCT 
TATGATTTGGGCTACACACGTGCTACAATGGACAATACAAAGGGCAGCGAA 
ACTGCGAGGTCAAGCAAATCCCATAAAGTTGTTCTCAGTTCGGATTGTAGT 
CTGCAACTCGACTATATGAAGCTGGAATCGCTAGTAATCGTAGATCAGCAT 
GCTACGGTGAATACGTTCCCX3GGTCTTGTACACACCGCCCGTCACACCACG 

40 AGAGTTTGTAACACCCGAAGCCGGTGGAGTAACCATTTGGAGCTAGCCGTC 
GAAGGTGGGACAAATGATTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGG 
AAGGTGCGGCTGGATCACCTCCTTTCTAAGGATATATTCGGAACATCTTCT 
ACGAAGATGAGGGAATAACGTGACATATTGTATTCAGTTTTGAATGTTTAT 
TAACATTCATTTGTACATTGAAAACTAGATAAGCAAGTAAGATTTTACCAA 

45 GCAAAACCGAGTGAATAGAGTTTTAAATAAGCTTGAATTCATAAATAATCG 
CTAGTGTTCGAAAGAACACTCACAAGATTAATAACTAGTTTTAGCTATTTA 
TTTTGAATAACAATTCAAAATATGGTGGAAACATAGATTAAGTTATTAAGG 
GCGCACGGTGGATGCCTTGGCACTAGAAGCCGATGAAGGACGTTACTAACG 
ACGATATGCTTTGGGTAGCTGTAAGTAAGCGTTGATCCAGAGATTTCCGAA 

50 TGGGGGAACCCAGCATGAGTTATGTCATGTTATCGACATGTGAATTTATAG 
CATGTCAGAAGGCAGACCCGGAGAACTGAAACATCTTAGTACCCGGAGGAA 
GAGAAAGAAAAATCGATTCCCTGAGTAGCGGCGAGCGAAACGGGAAGAGCC 
CAAACCAACAAGCTTGCTTGTTGGGGTTGTAGGACACTCTATACGGAGTTA 
CAAAAGAACATGTTAGACGAATCATCTGGAAAGATGAATCAAAGAAGGTAA 

55 TAATCCTGTAGTCGAAAACATATTCTCTCTTGAGTGGATCCTGAGTACGAC 
GGAGCACGTGAAATTCCGTCGGAATCTGGGAGGACCATCTCCTAAGGCTAA 
ATACTCTCTAGTGACCGATAGTGAACCAGTACCGTGAGGGAAAGGTGAAAA 
GTACCCCGGAAGGGGAGTGAAAGAGAACTTGAAACCGTGTGCTTACAAGTA 
GTCAGAGCCCGTTAATGGGTGATGGCGTGCCTTTTGTAGAATGAACCGGCG 
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AGTTACGATCTGATGCAAGGTTAAGCAGCAAATGTGGAGCCGCAGCGAAAG 
CGAGTCTGAATAGGGCGTTGAGTATTTGGTCGTAGACCCGAAACCAGGTGA 
TCTACCCTTGGTCAGGTTGAA6TTCAGGTAACACTGAATGGAGGACCGAAC 
CGACTTACGTTGAAAAGTGAGCGGATGAACTGAGGGTAGCGGAGAAATTCC 

5 AATCGAACTTGGAGATAGCTGGTTCTCTCCGAAATAGCTTTAGGGCTAGCC 
TCAAGTGATGATTATTGGAGGTAGAGCACTGTTTGGACGAGGGGCCCCTCT 
CGGGTTACCGAATTCAGACAAACTCCGAATGCCAATTAATTTAACTTGGGA 
GTCAGAACATGGGTGATAAGGTCCGTGTTCGAAAGGGAAACAGCCCAGACC 
ACCAGCTAAGGTCCCAAAATATATGTTAAGTGGAAAAGGATGTGGCGTTGC 

1 0 CCAGACAACTAGGATGTTG6CTTAGAAGC AGCCATCATTTAAAGAGTGCGT 
AATAGCTCACTAGTCGAGTGACACTGCGCCGAAAATGTACCGGGGCTAAAC 
ATATTACCGAAGCTGTGGATTGTCCTTTGGACAATGGTAGGAGAGCGTTCT 
AAGGGTGTCGAAGCATGATCGCAAGGACATGTGGAGCGCTTAGAAGTGAGA 
ATGCCGGTGTGAGTAGCGAAAGACGGGTGAGAATCCCGTCCACCGATTGAC 

1 5 TAAGGTTTCCAG AGGAAGGCTCGTCCGCTCTGGGTTAGTCGGGTCCTAAGC 
TGAGGCCGACAGGCGTAGGCGATGGATAACAGGTTGATATTCCTGTACCAC 
CTAGTATCGTTTTAATCGATGGGGGGACGCAGTAGGATAGGCGAAGCGTGC 
TGTTGGAGTGCACGTCCAAGCAGTAAGGCTGAGTGTTAGGCAAATCCGGCA 
CTCATAAGGCTGAGCTGTGATGGGGAGAGGAAATTGTTTCCTCGAGTCGTT 

20 GATTTCACACTGCCGAGAAAAGCCTCTAGATAGATAACAGGTGCCCGTACC 
GCAAACCGACACAGGTAGTCAAGATGAGAATTCTAAGGTGAGCGAGCGAAC 
TCTCGTTAAGGAACTCGGCAAAATGACCCCGTAACTTCGGGAGAAGGG6TG 
CTCTTTAGGGTTCACGCCCAGAAGAGCCGCAGTGAATAGGCCCAAGCGACT 
GTTTATCAAAAACACAGGTCTCTGCTAAACCGTAAGGTGAT 

25 

Sequence 3379 

step . 1002c06 .cons .ok 

TATCTAAAACCTCTTCATAAAGTGGACAAGAAGGTAAAGTTAAGTTATCCA 

30 TTTGAACTTTAATTAGATGTAATATTCTGTCTGCATCTTTATTCAATTGGT 
CATATGCCGCACTATTTAACTTCGATTGGTTAGACATGACATATTCCCCCT 
ATCGTTTCAATTAATATCTTCACAATAGAAGTTTATCCTACTGAATGTTAA 
AAAGCAAGCGAATTTAAATTAAGTGAAAGCGATTCTATGTTAAAATAATGA 
AAGATAAAAATTTAGGGAGTGGGCATTTATGATTCAAATAAAAGGAGCAGT 

35 CAATTTCCCTATTTCATTGGATAGTACGACTTGGATATTTGACGATAGAAA 
AGTTACTATTGATGATTTAGAACGTGGGGTATTTGATGGTACTAGACCCAT 
CAACTTTGATGATAACAAGGAATGGAACCGTGCTATTTTAGAAGGACAAAC 
AAATCCACCAACGCTTGATTCTGAAATTAAATATAAAAAACGATCGGTTTT 
AGATGAAACATTTGTGAATATTTATTTTTCTTTTTATCTTCATAACGTTCT 

40 TTAGCTAAATCAATCAATTTA6CAATTAAATCAGGGTAAGATAAGCCCATA 
TTTTTCCATAAGTTTGGATACATACTATATGCAGTAAATCCTGGCATGGCA 
TTTGTTTCATTAATATAAATTTGATTATCATCAGTAACAAAGAAATCTGCA 
CGAACTAATCCCGAACAATCAGTAGCTTTAAAGGCCTCTAATGCCATGTTT 
CTTAATGTCATTTGAACATCTTGATCTAAATCTGCTGGAATATCTAATCTA 

45 ATCTTACCGTCTTTATACTTTGATTTATAATCATAAAACGCTACATCCTTA 
ACAACCTCACCAGGCCATGTC6TTTCAGGATAATCGTTACCTAAGACAGCT 
ACTTCGATCTCTCTAGCATTAATCCCTTGTTCAATGACAAGTTTACGATCA 
AATTCGAATGCTTCAGCTATCCCAGATTTTAATTCTTCTTCATTGTTACAT 
TTACTTATACCAACACTTGAACCGAGATTAGCAGGTTTTACAAATACCGGA 

50 TATGTTAACTTATCATTAACTAATTTAATGATATTATTTTCATATTTTTCA 
TACTCACTTCTTAAAAAGCTAATATAAGGTAATTGAGGTAAACCTCTATGC 
TCAAATAATTGTTTCATCACGAGTTTATCCATTGAGCTTGAAGCAGCTAAC 
ACACCATTACCTACATATGGTATATCAAGTACTTCAAAAAGACCTTGGATA 
GTTCCATCTTCTCCATTTGGACCATGCAATAATGGGAATACTGCATCATAT 

55 GATTTTCCTAAACTACCTTTACTGAGTAACTGTGAGATTTCTCCAGTCTCT 
ACATCGTTAATGACGAGTTCATCAGTATTTTTTATTTCTTGTGTAATATTA 
TCTTTTTTCTTCCATTCACCATCGTTTGTTATATAAATGATATCAACTTGA 
TATCGTTCTTTATCAATTGCGTTTAAAACATTTTGTGCAGTTAAAATTGAA 
ACAT 
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Sequence 3380 
step . 1002c07 . cons . ok 
5 TATATCGCTTTTGTTGCTTAAGGTTTAAATCGTAACCATCAACATTAATGG 
ATAACGTGTAATATTTTTCTGGTTTAAAATAGTTAATCTCATCTTGTCTTG 
ATTTAACTATTTGTATTGTTGGAGTTTGTACACGACCTAATGATAATTGTG 
CATCATATTTTGTCGTTAGTGCACGTGTTGCATTAATCCCTACTATCCAAT 
CTGCTTCACTTCGTGCAAGTGCTGCTTCATATAAATTTTGATACGCGTTTC 

1 0 CATTTTTTAACTGTTTAAATCCTTCTTGTATGGCTTTTTTTGTAACCGAAC 
TAATCCACAAACGCTTGATTGGTTTTTTATTACCTACTTTATCTAAAATAA 
GACGAGCTACTAGTTCACCTTCTCGTCCAGCATCTGTTGCTATAATAATTT 
CTTTAACATTTTTATCTAAAATTAAAGATTTTACAATTTTAAATTGTCTAT 
TTGTTTTACTAATCACTACTGTTTTCATTTTCTTAGGAATGATAGGTAAGT 

1 5 CTTCTAAATTCCATTCTTTATATGAAGGGTTATATTGTTCAGGCGTTGC AT 
TTGTCACAAGATGCCCCAATGCCCAAGTTACTATATACTGTTTCCCTTCTA 
TATAACCATTACTTTTTTGTTGAAGATTCAAAGCATTAQCGATATCTCTTC 
CGACAGATGGTTTTTCAGCTAAGATTAAAGATTTCATTTTAATTTTCCTTT 
CACATTTATAAAATACGTCATGTTATAGAATTACATACATTGTAACATGTA 

20 TCTTTTTGTAAATTACTATATATAAAATTAATTAAGATAGTTATATATACA 
TATCTTGTTCAATCATATAGTTCAAAAACACATTATCACTTAATACGAATA 
TTTACTTACTAAAAATTTCAACACTTTTAAGGTGTTGTTTTTTACTTCTTT 
TTTTAATTACTATAGAATGATTTCGAGTGTAATTGGGACACTTAATTTTGT 
TCTTAGGAGAGTGAGTAATAAGTGCAAATTGTAGATTTTTTGATAGCACTT 

25 TTACCAGCCCTATTTTGGGGTAGTGTAGTCATTATAAATGTTTTTGTAGGT 
GGTGGACCATATAATCAGATTCGAGGTACAACTTTAGGTACACTTTTTATC 
GGATTTTCTCTACTTGCTACTGGACACGCAGCGTTTGATAACCTTACAGTA 
ATTATTGTCGGTTTAGTATCAGGAGCTCTATGGGCTTTTGGTCAAGGTAAT 
CAATTAAAATCAGTGCATTTAATAGGTGTATCTAAAACGATGCCTATTTCA 

30 ACGGGTATGCAACTTGTCGGTACCACTCTATTTAGCGCTATTTTCTTAGGT 
GAATGGAGCACGATTGTTCAAGTAGTGATGGGACTTATAGCAATGATCTTA 
TTGGTTGTAGGTATTTCTTTAACATCACTTAAAGCCAAAAGCGAAGGCAAA 
TCCGATAACCCAGAATTTAAAAAAGCAATGGGAATATTACTTCTATCAACA 
ATCGGTTACGTAGGTTATGTCGTTCTTGGAGATATTTTTGGAGTAAGTGGT 

35 ACAGATGCTCTCTTCTTCCAATCAATTGGTATGGCAATTGGAGGATTAATC 
CTTTCAATGAATCATAATACTTCAATTAAATCTACTGCTCTAAATCTTATA 
CCAGGTGTTATCTGGGGTATCGGTAACTTATTTATGTTCTATTCACAACCT 
AAAGTTGGTGTAGCAACTAGTTTCTCATTATCACAACTGCTTGTTATTGTT 
TCAACTTTAGGGGGTATCTTTATTCTAGGGGAGAAAAAAGATCGTCGCCAA 

40 ATGATTGGTATTTGGTCAGGTATTATCGTTATAGTTATAGCTTCAATCATT 
TTAGGCAACTTAAAATAGAATTTTAAACTTTAGGGAGGTAACATAGTGTTT 
GAAGAATTAGAAAATAAAGTGGTTCTTATTACTGGAGCTGCCACTGGAATT 
GGCAAATCTATTGCGGAAAATTTTGGTAAAGCTAAGGCCAAGGTTGTTATA 
AATTACCGTTCTGATCGACATCATGATGAAATTGAGGAAATTAAACAAACT 

45 GTTGCTAAATTTGGTGGTCAAACATTGGTGGTTCAAGGTGATGTTTCAATT 
GAAGAAGATATTAAACGAATGATTGAAACAACAATTAATCACTTTGGAACT 
TTAGACATTATAATTAATAATGCTGGATTCGAAAATTCAATCCCAACTCAT 
GAAATGTCGATTGACGACTGGCAAAAAGTTATTGACATAAACTTAACTGGC 
GCCTTTGTGGGTTCAAGAGAAGCCATCAATCAATTTTTAAAGGAAAACAAG 

50 AAAGGTACTATTATTAACATTTCGAGTGTTCATGACACTATTCCATGGCCT 
AATTATGTACACTATGCCGCAAGTAAAGGTGGCTTAAAATTAATGATGGAA 
ACAATGTCAATGGAATATGCCCAATACGGTATTCGTATTAATAATATATCT 
CCTGGGGCAATTGTTACTGAACACACTGAAGAAAAATTTTCTGACCCAACG 
ACGCGTGAAGAAACAATAAAAATGATACCTGCACGTGAAATTGGAAATGCT 

55 CAAGATGTAGCTAATGCAGTACTATTCCTATCTTCAGATCTTGCAAGTTAT 
ATACACGGTACAACATTGTACGTTGATGGTGGCATGATGAACTATCCAGCA 
TTTATGGGTGGTAAAGGTTAAGTAGCATTTTTCAATTTTAACTAGAAATAA 
AATGTCATAAATTAGTAATGAATAGTAGTCTTAAAAAGTATCAGTCATCGA 
CTGAATATAAGTGACAAACTATAATAAAAAAGCTATGAAGATATCTACAAT 
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GATACATTTCTTCATAGCTTTTTTAGTTATATAT^TAAGTATATTGATTAT 
GCAATCAATTTTTATACAAATACAGCTTGATAAATATACAAAATGATAATG 
CTAATATGAATATTTAGCAAACAATTCCGTTTCTAATGATGATCAATTATC 
ATCTGAAGAACAGCAGAAATACAATGAAAATACTAACGAAAACAAACAGAA 
5 TAGTCATGTTGACTTAAATTCATTACCTGCTACAGATTTTCGAACAGATTG 
GATGTCTGAATCTGGTCAAAAACAAGTGAATGATTTAGCAAACCAAAAAGA 
TAGCGGTCAAATTTCACAATACGAATTTAATGATCGCGTATCTGATGTGAT 
GAATAATGAAATAGATAATCAACAGCCATAATTGAAATATAGGGTAGCACG 
TCTACCCTTTTTATTTAGGGAGGGTAAACACGTGGCATAAGAAATTCACCT 

1 0 ATAAAC ATGGTGAAACTAAAAATCGCTATTATGAGAAGTACAAAGACCCTC 
TCACAAACAAATGACGACGTGTTAGCGTGGTACTTAATAAAGAATGGTAAA 
CAGTCACAAAAAGAGGCTCAGAAGCGCTTAAATGAGCTTATAGAGGCAAAA 
TTAAACGATAAGACACCTACTACACTTAAGTCACTAACATTCCATGATCAT 
TTGATGAGTGGTTTGAAAGATACAAATTAACTTCCGGATTTAAACAATCTA 

15 CCGTTACAACGAAAAGTTATAAAATAGCGCACATCAAAAGAAATATTGATA 
AAAATATACTCGCTCAAAATACGAATGCTGATGTTTTACAAGACTTAATTA 
ATTCTTCATTAACAGAGGGATTAAGTCATAAAGTGGTTAAAGATGATTTGA 
GTATTATAAAAAATATACTACGCTATACTCAAAAGAAATATAACATTACTG 
ATAAATCTTACATAGATGATGTAATAGTACCTAAAAAAGCGACTACAATAG 

20 AAGAAGTTAAAGCAAAACGTGAGAACTATTTAGAAATGAATGAAGTTCATC 
TTATCGCTGAAGAGTT 



Sequence 3381 

25 step. 1002cl2 .cons .ok 

GCTTTTCTCAAATTATCTTTAACACGTTCCTCATTGACTTTAGTTTGATGT 
TGTTCTCTTTTAGCTTGATTTTCCTTCTCTATTTTCTCACGTCTTATGCGC 
TCTTCTCGCTCTACGTTATGAAAGTATTCACGTCTTTTACGTCGATATTGA 
CTGCGTGGTATGACTTGTTTTTTATTGTTATCCACTAGAAGACCACTCCTA 

30 TAGTTTCTTTAACTCATTCTTTATCTGTAAAATTCAAGTTATTTTAATACT 
ATATATCATCATCATTTTAAATTCGTTTATAATCTAAAGTTTAATATTTTA 
ACTATAAAACTGCAATGTTCTAATAAAAATTTU^TAAAAAGAACCTCCAC 
TTAAATTATTATTAAGCAAAGGTTCTAGCATATCAATCAAATTATTTTTGA 
TTACCTTTTTTCTTGTTGCCAATTGCTGAAGTTAACCATCCTATTAAAACT 

35 AAGCCAACTAATACTACCCAGAAGATTGTTTGCCATAAAGCACTATGTGGA 
AATGCTTCTGGTAAAACGCCAATATCAGGATGTGCAAGTACCATTATAACA 
AGTTTAATACCTACCCAACCTACAATTGCAAACGCAGCACCTTCAAGTCCT 
GGATATTTATTCAACAATTCTACAAACCAAGTTGCTGCAAATCTCATCAAG 
ATGACACCTATCATTCCACCAAGGAACATAACAATAAATTGGCCTAAGTCC 

40 ATACCACCAAAATGTATGCCAACTTTTGGTAATGTAACG6CTATGGCTAAT 
GCGGCAAGCATCGAATCAATTGCAAACGCGATATCAGCGAATTCAACTTTA 
AATACTGTTCCCCAAAAAGATTTAGGGCCTACTTCTTTTTCGTTGCCTGTT 
TCATCGAAATGATGTTCGTCTCCTGTTTCTTTATGGTGTTTTTCATTTGAC 
TGATGGAAAAATTGCCATAAATTTTTAATAGACATATAGATTAAGTAAACA 

45 GCACCTGCTGCTTGAATCCACCAGAAGTTTGCAATAATACTTATTAAAAAT 

aaagcaataaatctaaaaatgaatgcacctaataggcx:ataaaaaagtgct 
tttttacgttgtttaggtggtagatgtttaaccattaccgccattacaatt 
gcattatcagctgctaataatccttctaaaaatacaagtactaaaagtacc 

CATAAATAAGGTAAAATCAAACTCGGATCCATTGTGGACAACTCCTTTTAG 
50 TTTAAATACAAATAAAGAGACCTATGCTAATTTAATAGCAAAGGTCTCACT 
TAGCAATATTTTTGCCCACTTTACCGGAAGACTTATCTTCGTAATGACGAC 
AAAGTGTTGAATAGTTATTATCGGTTTAAATGCATTCAATAACTATTCTCA 
AGTTACTCCCCrCTGACAGTTTTGTCATAGGTATTTATTTGTTATTTTCAG 
TATACATACTATGACACTGCATAGCAATATAAACATTTTTCATATACTATG 
55 GTGTGCCCGAATAGTGATATGTTTAAACATTCAATTATAATAAACTATACA 
GTTTAATATCAGAGAATTTTTCTTCAAACCAACGCGTTGCAAATTCATTTT 
CAAATAGGAATACATAATTGTCGTATCTATCTTTTACTAAAATTGATCTAG 
AAGTATTCATATTATCTTTAATATCTTCTTCATTTTCAATCCAACGCGCAA 
TTTTACGTCCAACTGGTTCCATAACCACATCTACATTATATTCATTATTCA 
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TGCGGTGTTCAAACACCTCAAATTGTAATTGACCCACAGCACCTAAGATAA 
TTTGATTTGTGTGCAACGTTTTATAATATTGAATTGCACCTTCTTGAACAA 
GCTGTTCAATCCCTTTATGAAAATGCTTTTGTTTCATTACATTTTTGGCAG 
AAACCTTCATAAATATTTCTGGAGTAAATTGTGGTAACTCTTGGAAACTAT 
5 ATTTCTGCTTACCACCAACTAAAGTATCGCCTATTTGATAGTTACCTGTAT 
CATACAAACCAATAATATCGCCTGCTACAGCATGATTAACAGTTTCTTTAT 
CATCGGCCATAAATGATGTGGAACGAGTTATTTTTTGTTTTTTATTAGTTC 
TTTGTAAGGTTACATCCATTCCTCTTTCAAACGCACCACTTACCACGCGCA 
TAAATGCAATTCTGTCACGATGCTTTGGATCCATATTGGCTTGAATTTTAA 

1 0 AGATAAACCCAGAAAAATCTGTGTCGAATGGACTAACATCAACCTCTTCTT 
TAGTTTGTCTCGCATTAGGCATTGGCGCATGATCAACGTATGCATTTAAGA 
AATTTTGTACGCCGAAGTTTGCTAATGCCGATCCAAAGAAAACAGGTGTTA 
ATTCACCGTTCAGTAACGCTTCATTATCAAATGCTTCACCCGCTTCATCGA 
CAAGCATCATTTCTTCAATTGCTTGTTCAAATGCACTATCATTTTTTATCG 

1 5 CGTGTTCTTCTTTAAGTTCGTAATCTTCGTTTAAATGTAATAAATTTTCTT 
CATCTCTAAATGGTTCAATCGTTTTAGAGTGTCTATCAATAATACCAAAGA 
AATTTTGTCCCATACCAACAGGCCAATTCATTGGATAAGTGTCAATATTAA 
GTGTTTCTTCTATTTCATCAAGTAATTCAAATGGCTCTTTACCTACTCTAT 
CCAATTTATTGATGAATGTAAAAATAGGGATACCTCGCATTTTACAAACTT 

20 TAAATAATTTTAATGTTTGTGGTTCTATACCTTTTGCACAGTCAATAACCA 
TAACTGCACTATCTACTGCCATTAATGTACGATACGTATCCTCTGAAAAAT 
CTTCATGCCCGGGCGTATCTAAAATATTTATCTTATAATCATCATAATCAA 
ATTGCATTACAGAGCTTGTAACTGAGATACCACGTTCTTGTTCTACCTTCA 
TCCAATCACTTGTTGCAAATTTACCCGTCTTTTTCCCTTTAACTGTCCCAG 

25 CTTCACGAATCGCTCCACTGAAATACAACAATTTTTCAGTTAAAGTTGTTT 
TACCAGCATCAGGGTGTGAAATGATGGCAAATGTCTTTCTTGACTCTATTT 
CTTCTTTTAAATTCATTTATCATTCTCCTTTGGATTCATTTCTAATCAAAT 
CAGTTATATTACGTGCCAATGCATCTGCTTCAACTTCAGTCAACAAATTGA 
ATAATTGCATCACTAATTCCTCATATAGCTGATTTGCATCTAACTCTTTGG 

30 ATAACTCAATGGACCAAAAAAGTTCTGGAATAGCTAACATGATTAAATCAT 
TAGAATGATAATGATATACGTTCATCATCATTCCATGAAACTGAAGTGATT 
TAATCCTCAAGAGGTCCACCACCAAATTTTTGATATGCTGCTTTTAAGCCA 
ATTAAGTCATCGCGATGTGGGACTTTAACATGACCAGGCATAATTTGATAA 
GGCTCTCGACCTTTTGAGGCCAAAACAACTGTATCACCTGGTTCAGCAATA 

35 TCAATCGCGTGTCTAATACCTTCTGCACGGTCATCAAACTCTATATAATTG 
TTATGCGTTGCACCTTTAGCTAATTCAGCTGTCAACATTTTAGGATCATCG 
TTAGCAGGATTATCTGGAGTAAAAATAACATAATCTGCTCGACATGCTACT 
CTCCCCATCTCAGGTGTTTTCGTTAAATCTCTTTCCCCAGCCATACCTACT 
AAAAAGATTAGCTTTTGTTTAACAAATGGTTGTACAGCATCAATCAGTTTA 

40 TTCATACCATCAGCTGTATGTGCATAATCTATAATCAAATCTATAGGAAGT 
GATGGATCAAGTACTTCTAAGCGCCCCTCTACAGGTTCAAGCTCAGTAACA 
GCATTAATAATTTCATTTAAATTTGTTCCCTTACTCCAAACCGCAATCATA 
GCAGCCATAATATTTGAAATGTTAAACCTACCTACATAAGGAGATTTAACT 
GGAAAACTTCCAAATGGAGTACAAAATTCAAATTCTACACCTTGCAATGAT 

45 TCTTTAATGTTTTTAGCCATAAATTGTGCTTCGTGAGTAATACCATATGTA 
AATACTTCATAGGGTGTCACACTTGCTAAATAATCGGAGAAG 



Sequence 3382 

50 step . 1002d01 . cons .ok 

CTGTCGTAAAGCTGGTATTATTACTGGTTTGCCAGATGCTTACGGACGTGG 
ACGTATTATCGGAGACTATCGTCGTGTTGCTTTATACGGTGTAGATTTCTT 
AATGGAACAAAAACTTAAAGACTTTAACACAATGTCTACTGAAATGTCTGA 
AGATGTAATTCGTTTACGTGAAGAATTATCAGAGCAATATCGTTCACTTCA 

55 AGATTTAAAAGAATTAGGTCAAAAATATGGATTTGATATTAGCCGTCCTGC 
TACTAACTTCAAAGAAGCTGTGCACTGAGAGATCCCCTCATAATTTCCCCA 
AAGCGTAACCATGTGTGAATAAATTTTGAGCTAGTAGGGTTGCAGCCACGA 
GTAAGTCTTCCCTTGTTATTGTGTAGCCAGAATGCCGCAAAACTTCCATGC 
CTAAGCGAACTGTTGAGAGTACGTTTCGATTTCTGACTGTGTTAGCCTGGA 
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AGTGCTTGTCCCAACCTTGTTTCTGAGCATGAACGCCCGCAAGCCAACATG 
TTAGTTGAAGCATCAGGGCGATTAGCAGCATGATATCAAAACGCTCTGAGC 
TGCTCGTTCGGCTATGGCGTAGGCCTAGTCCGTAGGCAGGACTTTTCAAGT 
CTCGGAAGGTTTCTTCAATCTGCATTCGCTTCGAATAGATATTAACAAGTT 
5 GTTTGGGTGTTCGAATTTCAACAGGTAAGTTAGTTGCTAGAATCCATGGCT 
CCTTTGCCGACGCTGAGTAGATTTTAGGTGACGGGTGGTGACAATGAGTCC 
GTGTCGAGCGCTGATTTTTTCGGCCTTTAGAGCGAGATTTATACAATAGAA 
TTTGGCATGAGATTGGATTGCTTTTAGTCAGCCTCTTATAGCCTAAAGTCT 
TTGAGTGACTAGATGACATATCATGTAAGTTGCTGATAGGTTTCCAGTTTT 

10 CCGCTCCTAGGTCTGCATATTGTACTTTTCCTCTTACTCGACTTAACCAGT 
ACCAACCCAGCTTCTCAACGGATTTATACCATGGCACTTTAAAGCCAGCAT 
CACTGACAATGAGCGGTGTGGTGTTACTCGGTAGAATGCTCGCAAGGTCGG 
CTAGAAATTGGTCATGAGCTTTCTTTGAACATTGCTCTGAAAGCGGGAACG 
CTTTCTCATAAAGAGTAACAGAACGACCGTGTAGTGCGACTGAAGCTCGCA 

1 5 ATACC ATAAGCCGTTTTTGCTCACGGATATCAGACCAGTCAACAAGTACAA 
TGGGCATCGTATTGCCCGAACAGATAAAGCTAGCATGCCAACGGTATACAG 
CGAGTCGCTCTTTGTGGAGGTGACGATTACCTAACAATCGGTCGATTCGTT 
TGATGTTATGTTTTGTTCTCGCTTTGGTTGGCAGGTTACGGCCAAGTTCGG 
TAAGAGTGAGAGTTTTACAGTCAAGTAAGGCGTGGCAAGCCAACGTTAAGC 

20 TGTTGAGTCGTTTTAAGTGTAATTCGGGGCAGAATTGGTAAAGAGAGTCGT 
GTAAAATATCGAGTTCGCACATTTTGTTGTCTGATTATTGATTTTTGGCGA 
AACCATTTGATCATATGACAAGATGTGTATCTACCTTAACTTAATGATTTT 
GATAAAAATCATTAGGGGATTCATCAGAGCTGTGCAATGGTTATACTTAGC 
ATATTTAGCTGCTATCAAAGAACAAAATGGTGCAGCAATGAGTTTAGGACG 

25 TACTTCAACATTCTTAGATATTTATGCTGAACGTGATTTACAAAATGGTGA 
CATCACTGAACAAGAAGTTCAAGAAATCATTGACCACTTCATTATGAAATT 
GCGTATCGTTAAATTCGCGCGTACGCCTGAATATAATGAATTATTCTCTGG 
AGATCCAACTTGGGTAACTGAATCTATCGGTGGTGTAGGTATTGACGGACG 
TCCAATGGTAACTAAAAACTCATTCCGTTTCTTACACTCATTAGATAATTT 

30 AGGTCCAGCACCAGAACCAAACTTAACAGTGTTATGGTCTACTCGCTTACC 
TGAAAACTTCAAAATCTATTGTGCTAAAATGAGTATTAAAACGAGCTCAAT 
CCAATATGAAAATGATGACTTAATGCGTGAAAGCTATGGCGATGATTATGG 
TATCGCTTGCTGTGTATCTGCCATGAAGATTGGTAAACAAATGCAATTCTT 
CGGTGCACGTGCTAACTTAGCTAAAGCATTACTTTACGCTATCAATGGTGG 

35 TAAAGATGAAAAATCTGGTAAACAAGTTGGGCCAAGTTATGAAGGTATTAA 
ATCAGACGTACTAGATTATGATGAAGTCTTCGAAAGATATGAAAAAATGAT 
GGACTGGTTAGCTGGCGTATATATCAACTCATTAAATATCATTCACTATAT 
GCATGATAAATATAGCTATGAACGTCTTGAAATGGCTTTACATGATACAGA 
AATTATTCGCACAATGGCAACTGGTATTGCCGGATTGTCTGTAGCAGCTGA 

40 CTCTTTATCAGCGATTAAATATGCACAAGTTAAACCTATCCGTAACGAAGA 
AGGTCTTGTAACTGACTTTAAAATCGAAGGCGACTTCCCTAAATATGGTAA 
TAATGACAGTCGTGTTGATGAAATTGCAGTAGATTTAGTTGAACGTTTCAT 
GACTAAATTACGTAGCCATAAAACATACCGTAATTCTGAACACACAATGAG 
TGTATTAACAATTACTTCAAACGTTGTTTATGGTAAGAAAACTGGTAACAC 

45 ACCAGATGGACGTAAAGCTGGCGAACCATTTGCACCTGGCGCAAACCCAAT 
GCATGGTCGTGACCAAAAAGGTGCATTATCTTCACTAAGTTCAGTAGCTAA 
AATACCTTACGATTGCTGTAAAGATGGTATCTCAAATACATTTAGTATCGT 
ACCGAAATCACTAGGTAAA6AAGAAGCAGATCAAAATAAAAACTTAACTAG 
TATGTTAGATGGTTATGCAATGCAACATGGTCATCACCTCAACATTAACGT 

50 ATTTAATAGAGAAACATTAATTGATGCAATGGAACATCCAGAAGAGTATCC 
ACAATTAACGATTCGTGTATCTGGATACGCTGTAAACTTCATTAAATTAAC 
ACGTGAACAACAATTAGATGTTATTTCACGTACATTCCACGAATCTATGTA 
ATAAATTTAAGGTGGGAGCAAAATGCTTAAAGGACACTTACACTCCGTTGA 
AAGTATGGGCACTGTCGACGGGCCAGGACTTAGATATATATTATTTACTCA 

55 AGGTTGTTTGTTAAGATGTTTATATTGTCATAATCCAGACACTTGGAAGAT 
TAACGAACCATCAAGAGAAGTGACGGTTGATGAAATGGTAAATGAAATCTT 
ACCGTACAAACCTTACTTTGAAGCTTCAGGTGGTGGGGTAACAGTCAGTGG 
TGGCGAACCATTACTACAAATGCCTTTCTTGGAGCAATTATTCAAAGAATT 
AAAAGCGAATGGTGTTCACACATGCATTGATACTTCTGCGGGATGTGTGAA 
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TGATACACCAGCATTTAATCGTCATTTTGATGAATTGCAAAAGCATACAGA 
TTTAAACATATTGATAATGATAAGCACATCAAATTAACAGGCAAACCTAAC 
ACACATATTTTAAAGTTTGCACGTAAATTATCTGATATGAAACAACCTGTT 
TGGATTAGACATGTTTTAGTACCTGGTATTTCGGATGATAAAGAAGATTTG 
5 ATAAAACTAGGAGAATTTATTAATTCTTTAGATAACGTTGAAAAGTTTGAA 
ATCTTACCATATCATCAACTCGGTGTGCATAAGTGGAAAAATTTAGGCATC 
CCTTATCAACTCGAAAATGTTGAACCATCTGACGATGAAGCGGTTAAAGAA 
GCTTATCGCTATGTTAACTTTAATGGCAAAATACCCGTAACATTATAGAAA 
GATAAATATAGTAATAGCCTGAGACATCAATCGTTGTCTCAGGCTTCTTTA 

10 CGTTTAATCAGTCGATAAACTGAGTATAAAATATATCTCAATTAACTTTAT 
TTAATAAAAATATGATTCATAAGTTTAAATAGTTTATTTTTCTAGATATGC 
TTTAAATACATCTTTATAAACATCGATGTAACTTAAATACATATCTTTTCT 
AATAAATTCATCAACTTGATGCGCCATAATAGATTCACCAGGTCCAAAGAC 
AGCAAAATCCACGTTATTTTCATTTGTTCCTAGGAAACTAGATGCATCAGT 

1 5 TGTACCAATCAATGC ACTCAC AACAATGTCTTCATGTACATAATTCGGTGC 
AATACGTGTAATATTTTGAATAAGAGGATTATCACGATCACTTGCCACTGG 
ATCGTGACTGCTAGGTATATCTACAGTTAAATAATCTTCGCCCACATGACG 
AATGACTTTTTCAAATAAATCCTTCACGAAAGTACTGTCATATTCTGGAAC 
AGTTCTTACATTATATTTAGCTGTCGCTTTATGAGGAACAGAATTAACTTG 

20 TTTACCACCATTGAATACAGAGTTTAACATTACAAATCCAGAGTAGATATG 
TGATTCTTCTTCACCAATTTTTCTGTGGAGATGTTTCTCAATCATTGGAAC 
AGCGTCTAACTCGTGTACTTTATCATGTTCTTTAATATTTTTATATTCTTG 
TTTCATTTCATTTACAAAATCAACTAAAATATCAACAGCATTTGTACCTAG 
GTGTGGCATGGAACTATGTGCAGCCTTACCTTTCGCTGTGACTACACAGCT 

25 CATAGAACCTTTATGAGCATAGTAAGCGATATTACTAGTTGGTTCGCCAAT 
AATTAAACCACTCACATCATCTAAGTAACCTTCATCAGCTAGTAATTGTGC 
ACCATATTGTT 



30 Sequence 33 83 

step . 1002d02 . cons . ok 

GAGATAATCATAAAAATTAGATACTACAGTCATAGTCAGATTAATTGTCTT 
AGGACTAATGCTTTTTTCTTTTCGGTGATAAGAGAGGACATTCTCATATTC 
AAAAGGTGTTTTCATCCATCTTATAAAATCAACAAAGTTATCAAAACTAAC 

35 TTCTTTATAGCATATCTTTTTACTTTCTAAGTAAACAAAAAAATTTTTTAG 
TGCATAGGCATAGGTTTTCTTGGTATTTAAACTTTTCTTAACACTATCCAG 
ATACTTCAAATATCTTACTGCATCTACTATAGGTTCATTATTACCATCTAA 
AATCATAAAATTGGTACCATTCTTAGATTTTACTTCTACTATTTTCATAAT 
AATCTCATTTCTAATTTTAATAGTATAAGTATACACTATTCAAATTCTTTT 

40 TACATCATACATATTAAACTTATTACACGTAATAAAAAAAGACTATTTAAT 
ATTTTTACTTTGTTCATTTTTTCTTAAAATATGATTCATTTTAAACAAAAT 
AGTCTATATTCTTCAACATATATATAACCATTCGCAACTTTACCAGAACAG 
TTTGAAACAATTTATCAATATATGATAGATCAAAATAAAGTGAGCAGACCA 
GTGCTTTCAGACGACCAACTTTCGCAACTAAATATACATTTACATGAAGCA 

45 CTACAACAATCACGCCCAGTCAATATCAAATATTACGAAGAAGGATATATT 
AATTTTATAGAGCTCATTGTACATCGGATTGATTCAATAAATTAT6AGATT 
GAAGGTACTGCTCCTCATTCAAGAGAACGTCATAAAGTTTCATTTTTAGAT 
ATTATAGATATTTCTTTTATATAAAGAGTACAAGAGAATTTATTATGAATT 
TCTCATTTTACACTTATATAACCACCAACATACTTAGTGTTATCTAAAGCC 

50 TTTCTGAATGAAATCATTTAATTATTAATTTTAGAATTCAATAATTAATTT 
TGACTGATAATTGGTAAAATCAACTAATAAAAATAAGCCTACACATCTCAA 
TGATGTACAGGCTTATTCCTAAATTATTATAAATTAATTGTTGTTTTCGAT 
ATCAGAAACGATTTCTTTCGTTTTTCTTTCGATATCTTCAAAGTCTTTTTT 
AAAAATATAAGCAAAGGCAACTACACCGATGCCTACTACTACTGTAGTAAT 

55 AGCTAATAAACTAAAGACAAATTTGAATAACCCTTTAATAAAGTTCCACAT 
TAGTTTTCACCTCTATAATTAATATGCATAGCTACATTTCATAATGATAAC 
ATACTTAACTTTATCATTAAATATATACCCTTTTTACAGGTAATTATTCTA 
AATATGAATATAAATTATGATAATATATCAACTAAAGATGCTTAAAATGAA 
ACAATTAAATGATTTCTTTAATCATGGTACACACACAGATTAACTTAGCAA 
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CCAAGATTTTAATTTATCAGTGATTACTTTGTAATTATCAATATTCAGAAT 
GATCACTCCTATTGAGATAATGTCATTATCTTAATTTAGATTGTATGATTT 
ACGCAAATCAAGTTCAATTAATAATAAGTTACTTTTATATTACATTGAAGT 
AAAAATATAAGGAGAGATTATAAAATGCAAAAAGTACGTTCTGATATAATG 
5 ACTCATCGTGGTTCTCACTATGATTTAGGAGTAAAGACTGCTCTTTGGTTA 
CAAACCACTCCTTTATTAAAAAATCGAAATAAAGAATGGCGAAAGAGAATT 
CCACGTTTTGATATTGATGTCAAGGAAACCTATGATATATTTCAAATCTAT 
TCGCCACAGATTTGGGAAGAAATAATTGGTATGCAAGATGTATTGAATCTA 
CCTACAAAAC/^TGATTTTAAATTTTGGCCATTATCGATTTACTGATTTA 

1 0 AAGG ACAGTGGTTGCAC AGTATATAAAGGTCGTG ATTTTTTAGTCCGAAAT 
TATGATTATCATCCTGCAACATATGATGGTAGATACTTATTATTTCAACCT 
AATGACGGGGGATTATCTCAAATAGGACCGACTTCAAGAGTGACTGGTAGA 
ATGGATGGTATGAACGAGTATGGTTTAGTTATGGCATATAATTTTATGCAT 
CGTAAAAAGCCTGCAAATGGATTTGTATGTTACATGGTTGGTCGGCTAATA 

1 5 CTTG AAAATTGC AAAAATGTAACTGAAGC AATC AAATTTTTAAAGGAAGTA 
CCGCATCGTAGTTCATTCAGTTATATACTAATGGATAGACATTCGAATTAT 
GCCATTGTCGAAGTTACACCTCGATCAATAGATGTAAGGTATGAACATATA 
TGCACAAATCATTTTGAATTGCTTACCCATGAAAATAGAAACTATACAAGA 
GAATCTAAAGAACGCTTAAATCGTGTAATAAATAAAACAACTCCTTCTACA 

20 AACAAAGATATCGCATTCAAATTATTTAACGACCCGCAATACGAAATCTAT 
AGCAACCTATTTAAAAGTTGGTCTGGTACAATTCATACTTCACTATATGAA 
CCTAATTCATTAATATCATGGATGGCATTAGGTCAAAACAGTCATCCAACC 
TCAATCAATTTTTCTAATTGGTTAAAAGGAAAGAAATTGAATATAAATTAC 
TTTGAAGGCGAAATAGATACACCATTAACTTTTGCCACATACTAATTAAAA 

25 ATTCGACCCTATCTTGATAAATCAAAGGGTCGAATAATTTATATTATTTAT 
TCAATTTTAATATCGCCATCTACAGTGTAAAATTCCAATATATTATCGCTA 
TTTCCCACTTTTCCTTTATCGAAATATCGATTTTTAATTTCTTTATTTCCA 
TGACCAGGATGAAGTTTAAGCAAAGTGTTTTTAGGTTTTTCGCCAAAATGA 
TAATTAATATTACCTCTTTGTGTAGAAGCTTTTATATCATTGTTAGATTTC 

30 ATGTCCGTCATAGAAATATCTCCTCTATTAACCAAGAAAACAGTATTAGCA 
CTCTCGTCAATAATATTAACAGCATGTGGACAAGAAAATGTAAAAAAAAAC 
AGAAA6CAATCCAAAAGAAGAACAAAATA 



35 Sequence 33 84 

step. 1002d03 .cons .ok 

TCTGCACCATATTTTACTGCTTGGCTCACTTCGTAATCTGCTGCATCCATA 
ATTTTCAAGTCAGCTAATACTTTAGCATTATTAATATTTTCATTTAAATGT 
TGAACTGCAGGT/^CCCTTCATTAATTACAATTGGCGTACCAATTTCAACA 

40 ATATCTACATATTCTTCAACTTTTTGAGCT7VATTTTGCTGCTTCTTCTTTA 
TTTAATAAATCAATGGCTAATTGTAGTTCCAATGAAATTCATCCTTTCGTT 
TTGGGAAATTCTCTTTTGCTTATAGTTGTATACCCTTCGCTAAACTTATTA 
AACATATAAAATATTTTGTATATAAAACAACTATATAATTATAAACAATCT 
GGCGCTGCTTCATCGTCTACATAAACTTC/U^CATTAGGGTGTGTGTGTAAA 

45 ATGGTCGCAGGTACATCTTCGGTAACCTGTTCATTTAACAGTTTACTTATA 
GCCTCTTTTTTCTTTGGACCAAATGCGAGTAGGATAATCCTTTTTGCTTTT 
AAAATACTTTTTACCCCCATTGAAACTGCTTGTCTAGGAACATCCTTTTCA 
TTGTCAAAAAATCGACTATTTGCTTTTATGGTGCTTTCTGTTAAGTTCACC 
ACATGTGTTTCACTATTGAAGTCAGTCCCTGGTTCATTAAAACCAATGTGA 

50 CCATTTTCTCCAATTCCTAAAATTTGAATATCGATTGGCCCTCTTTCATCT 
AATAAATTGTTATATCGTTCCGCTTCTGCTTCTAGATTTTCTGATAAGCCA 
TCAGGAATATGAATGTGATTTTTAACAAAATGAGGATATTGCTCAAATAGC 
ACTTTATTCATGTAGGTATGGTAACTTTGTTGATGACTTGCTTTTAAACCT 
ACATATTCATTTAAATTAAATGTTTCAACCTGCGACACATCAACTTTATTT 

55 TTGATTAATAAATTTACTAAGTAATCATATACGTCAGTCATAGTTCCACCA 
GTTGCTAAACCGAGTTTGGCATGAGGGTGCTGCTGTATTTGTTTAAACAAC 
TCGCATGCTACATAAAATGACGCAAGATTTTTTGAATCTAAATTAATAATT 
TTCAATATCAAACACCTCTAATTTCAATATCACAATGATTGATGACTAACG 
CCTTACTCTTTTATATCTCTTAAATTATACTGTAACAAATGATTAAGATTT 
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TAAACATATTAACGCTCTATATGTAATATTCAACGTTATTTCTTGAATTTT 
TCCTAATTAAATCGAGCATTCGTGTACAAATATACATAGCTTTCTAGTAAA 

ATAGATAGATAAACGAGTTTGCTGGTAAATAATATGTTTATCGGTAGTCCG 
TCACTGACATATTGTTTAGTTTAAGGGAGTGAACGTGTTGGAACCAATTAA 
5 AGAACAAGAAGTGCTAGATTTATTAACTTCTTACTCAAATCAGCCTGTTTA 
CCTACACGTTGAAACAACAAATGGTGCTTATGCAAATCATTTCGATCAACG 
CGTATTTAACGCTGGAACATTTTTAAGAAATATTGTCGTGACTTTTGAACA 
TGCACAACTTAAAGGCGGCGACAAAGATCCATATCGTGTAGGTCTTAAATT 
AAAAGATGGTGGCTGGGTTTACGTGCAAGGACTTACGCACTATGAAGTTAA 

10 TGAGAATAACGAATTTTTAATTGCAGGTTTTT^TTATGAAGGACAATTGGC 
TGCTACAATAGAAATAAGTAAACAGCCATTTACTATATAAAGGAGGTTGCC 
AATCATGACTGATGAAAGACACGTACTTGTGATTTTCCCCCATCCTGATGA 
TGAAACTTTTTCGTCTGCTGGAACTATCGCAAGTTATATTGAAAAAGGTAT 
TCCCGTCACATATGCATGTCTTACCCTAGGACAAATGGGACGTAATCTAGG 

1 5 TAACCCTCCTTTTGCAAC AAGAGAATCTTTACCATTTATACGTGAACGTGA 
GTTAGAAGAAGCATGCAAAGCAATTGGGATTACAGATTTAAGGAAAATGGG 
GTTAAGAGATAAAACTGTTGAATTTGAACCTTACGATCAAATGGATCAAAT 
GATTCAATCACTTATTGACGAAACAAATCCATCATTAATTATTTCGTTCTA 
TCCTAAATTTGCAGTTCACCCTGATCACGAGGCAACTGCAGAAGCTGTAGT 

20 ACGTACAGTTGGACGCATGCATGAATCAGATCGACCCCGTCTTACACTTGT 
AGCGTTTAGCAATGATGCATCAGAAATTCTTGGAGAACCTGATATTCAAAA 
TGACATATCTCAATATAGTGATATAAAACTTAAAGCTTTTGAAGCACATGC 
TTCACAAACAGGACCATTTTTAAAACAACTTGCTAGTCCCGAAATAGATGG 
TCAAGCACAAAGTTTCTTAAAAATAGAGCCATTTTGGACATATCACTTTGA 

25 ATCTTAAATGGAGGTAAAACATGACAGAATTTGACTTATCCACTAGAGAGG 
GTCGTTGGAAACATTTCGGTTCTGTTGACCCTGTCAAAGGTACGAAACCAA 
CTACT7VAAAATGAAATGACCGATTTACAAAGTACTCATAAAAATTTCTTAT 
TTGAAATAGAGGAAGTAGGCATTAAAAATTTAACTTATCCAGTTTTAATTG 
ATCAGTACCAAACAGCTGGTTTATTTAGTTTTTCAACGAGTTTAAATAAAA 

30 ATGAAAAAGGCATTAATATGAGTCGCATATTAGAAAGCGTTGAAAAACATT 
ATGATAATGGCATTGAACTTGAATTTAACACACTACATCAATTGTTGCGTA 
CTCTACAAGATAAAATGAATCAAAATGCTGCAGGTGTTGATGTGTCAGGTA 
AATGGTTCTTTGATCGTTATAGTCCTGTGACTAATATTAAAGCTGTAGGCC 
ACGCAGATGTTACTTATGGTTTGGCTATTGAAAATCATACCGTTACACGCA 

35 AAGAATTAACTATTCAAGCCAAAGTAACAACACTATGTCCTTGCTCAAAAG 
AAATTAGTGAATATTCCGCACATAATCAACGTGGTATCGTTACAGTTAAAG 
CATATTTAGATAAAAATAATGATGTCATCGATGATTATAAAGATAAAATTT 
TAGACGCTATGGAAGCCAACGCTAGTTCTATCTTATATCCAATCTTAAAAC 
GTCCTGATGAAAAACGAGTAACAGAACGCGCTTATGAAAATCCACGATTTG 

40 TTGAAGATTTAATTCGACTAATTGCAGCTGACTTAGTTGAATTTGACTGGA 
TTGAAGGGTTCGATATTGAATGTCGTAATGAAGAGTCTATCCACCAGCATG 
ATGCTTTCGCACGTCTGAAATATCGCAAATAAAACATTGAATTTCAACTAC 
AATGCCAGTTGTATTTTTAGTTTATAAGATGCAACTGGCAATTTTTTAGCA 
TCATCAATATCGATGATACATAATGTAACACAAAAGAAACTCACCCCATGC 

45 TAAACATTCAATGTTGAATTTATTA7KAGGTCGTTACGAATCTAAATAAATC 
AGAAATTCTAGATTTAATTTATTGTAGTGTATTTATGCACTTGTAATTTAC 
ATACATATTATAACGTTTACAACCAACTTTAAGTCATCAGATTAAAGTATT 
TATTAATCACACATACGTGTTTACTAACAAATGAACTTCAAACAAATGCAT 
CAATTCAATTATAGGTTACCTGAGCTATTGACAAACTATTTACATTTAATA 

50 CATTCGACTAGTGAACCAAGAATGTGTCATGCTATGAAGAAAGGTGAAATT 
CATGATTACTG 



Sequence 3385 
55 step. 1002604 .cons .ok 

ATATTGATACTTTCAATCTACTGTTCAACATCTTTTTAGGTAATTGACTCT 
GTATATATGCTTTATGCTTTGTATAAGACGTTGGCATGTATGTTTGGATAA 
ACTTACCTGCATTCCTAAAACGTGGACGAGGAGAGCCGATAGGTTCCTTAT 
ACGTATCGTTAAAATTAATCTCTATTTCCATAACTCACCTCAAAATAATAA 
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TTCGTTAATTGTCATCTGTTGTTGCAGTTCTTCTTTTCTGAACAACTTATG 
TTTACGTTTAAGTTTTTCTAGTTCATCTTTCGTTACTGTTCCTGAGAATGT 
GTTTCTAAAGTGTATGCCTGCATAGTTACCTAGTTTGAATGTATCTTCTCC 
TAACGGCGTTACACTACACATCTCCCAACCGTCAATTTGATACAACATGTA 
5 TTGCTTTTTAAGTCCGTCGATAAGCCCCATCTGGTTGCCTCCATTTCGTTT 
CATTCATGATTAATTCCTGAACTTTTTCATATTCGTCAAATGGTGATATCG 
TTrrGTTTTCTAACAAACGTTTAACTGCCCAGCCTGACTCAATAAGCGTCT 
TAGCTATTAATGGGTCGTTTTGATAATCTTCTCGATACATAACGCCTAACA 
ACTTTTGATATTCAACTACTTTCATGTGAAGAACCTCTGCGTTTTCTTGTA 

1 0 GTACTC AAACTCAACTAC ACCTGTTTCTCCGTCTTTATTCTTTGCGATGTT 
ACATTCAACAATTGACTTGCCTGAGTCATCAACATCATCACGGTTGTAGTA 
ATCATCTCGATATAACAACATAGCTAAACTTGCATCTGCTTCAATTCCACC 
TGCTTCTTTCATGTCAGATAGCATAGGTCTTTTGTCATTTCTTGTTTCTAC 
ACCTCTGCTCAATTGAGATAGCAACACAATAATTGCACCTGTTTCATTTGC 

1 5 AATAATCTTCAAATCTCGCGATATCTTTTCGATACCATTACGACGATCTAA 
CTTACTGTCTGTCTGCATAAGTTGTAAGTAGTCAATGAAGATAACCTGTTG 
CACATCTTTGTTCTTCATCGCTTGTTTACGTACATCATGTGTAGTAATATT 
GCTTTTATCGTGTATATCTATATCAAGTTTGAGTATTCTGTCTGCTGCAGT 
TGTTAAACGTGTTAATTCATCCGGTTCTAAATCTTTAATTTCTTTGATACG 

20 AGTTAGTTCTATCCCAGTTTCTGCTGATAACATCCTTTTCAATACAGACAC 
GCCAGTTGTCTCTAGACTGAAGAATGAAGTTTTATAGCCTTGAGACGCTAT 
ATTAAGCATCATATTAAGCGCAAACCCCGTTTTACCTACTGACGGTCTCGC 
AGCGATTACAATCAACTGTGTAGGTTCTAAACCACCTATTTTGTAATCCAC 
CAGTTTATAACCTGTATTGATTTTTTGTTTTGGTTCTTCGCTATATAATTC 

25 TTCGACAAAGTGATCTACAATTTTTTTAGTCCCACTTTCTTCACTTGCACT 
AATTAAACTGACCTTTTGTAGTTTGTTAAGCATTTCATCAAAATTTTGTAT 
ATTCGGATCAGAATTGAATTCTTGTAATACGTTTTGCGTACGCTCTATTTG 
ATAAAGATTGAGCAAATCTTGTTGATATCTTTCAAAGAATCCGTAACCTAT 
AAATTTTGAGTTATACAAATTTGAAATGGTGTCCATATCTAGGAATGACTT 

30 ATCTTTTGTGGTTTTTAAATAGATTTCGTTATGATCTACCTTACCGACTTC 
GAATACATATTCCATAAATGACTTCATACCATCATGTGAGAACATTTCCGG 
TCTCACACGTAACTTCTCAATTATGTCCGGTTTTTGAAGTAAACTAGCAAC 
GATTGTACTTTCGATTTCATGACGTTCATTCATCGTCACTCACTCCAAACT 
CACTTAACTTCTTTCTGAAGTCGTCTAATATCTTTTTACGTTGTGCTACGT 

35 ATTCTGGATCATTTTTCATTCTCCAACGATGTCTAGCAGTTTTTTCGTCGA 
CAGGCTCTTCTTTTACGACTTTGACTTCTTTTCTCATTATGTTTGGAATAC 
TAGGTGGATAAGGATTAGCATCATTGATATATTGCATTACTGTTTTTTTAG 
TCGGTTCATAATCCCCATTTTGGCTCAAAATGTTAACCCATGTTTCTAATT 
TAGGTCTGTCAAAGTCAATGTTGTATACATGTCTAATTGTCTT7ATTACTT 

40 CTAGGGCTTGTTGTTTAGTCATAGGCATTAACTTTCATCTCCTAGCTCTTT 
CTCCATTTGAGCAATTACATCATCTGTTACAGTTTTTTTATTTTTAGGTTT 
AATTTTATTTTCAGCCTCTTTTTTATTCTTAACGTTCTCTTTAGCCCAGTT 
GTTTAACACTTTGATTAGGTAACCTGCATGACAACCTTTGTCTTTTGTATA 
ATCAGTAGCTACTTCAACAACTTCATCTGCATGTTGTCCAATATCATCAAT 

45 GGCATATCCTATCTGTTCCATTTGGTTAGGAGTTAAATTATGAGTAAGGTT 
ACTCATGATGTAATTAATTGAGTTTTTGAAGATATCTTTATCTACT^ 
TTCTCTTTCTTCTTCTACTTCTTTATCTTCTTCTATATCTGTTGCGTGACT 
GTCACGTGACTTCACGTGACTATCTAAAAGTTTTTGTTTTTTTCTTTGCTT 
TTGTTTACGCAAACGGTTTTGTTCTCTTATCTTTTCTAAACCTTCGATGTT 

50 CTGATGTTTTTCCCAATTAGATACTTTAAAGACACCATTCACTTCTTCAAT 
CATGCTTAGCTTTTCGAATGTTTGTAACGCTAATCTTATTGAATTGATAGG 
TCTATTAAATTCGOTAGCTAACATT^CTTCGTTATAGGGTAGACTTTCGGA 
TAACATAATGTATCCTTGTTCGTTATACTTTCCAGCTAATGTCAGCAACTT 
AACCCATAAAGTGATGATTGTATCTCGTTCTGGCAGTGCTTCTATATACTT 

55 GATTTTGCTATCATCGAACATTCC 
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AAACATCACCTTTAAATAACTGATTAATTTGATGAATAATATTAATTGAAC 
TAAAATCATCCGTTAATCTCATGGAATATGCATAAGAATAATTCCAATTTT 
TTAAATGTGCCAAAAAATAATCTAAAGATTCTACAGTTTGCGATATGTCAT 
CAATAATATATCCATTATGCATATGTTTAACATAATCTGTTCTCTTTTTAT 

5 TAATTTGTGGTATACCCGCGCCAATACAACAGATTTGTAAAAAAAGTTTCG 
GCTCTAAAGATAAATCTACTACAACTCTTAATTTTGAAATAACGCTTACAA 
GATCTTCTTCAAACGGTACTGTTTCAATATCAATGATTGTTTCTTTTTTAT 
TCTGTATCACGTCTTCAATTACATCACTTATTTCGTTTTTCTCTTGGTGAT 
ATAAATCATTGAGATGAGCAACTTCATCTATAAGATTTTCCGTAAGATTAT 

1 0 CTCTACTCTTAGTTAAAATTTTCAACTTATAGCCATCTTTATGTTGAATAT 
ATTGAAATAAGCTGTTTACAATCTCTCGTATTTCAATCTCGTCCAAACCAT 
CTATCCATAATCCAATATAAGTGTCATACAATTGACTACTTTGGTTAGGTA 
ATGACTTCGCATCAAAAGGCGTTACCCTTAAAAGATTAATATCATTCATGT 
TCAATCCTGCGTAGTGTTCAATCATATTTTGATTAGCTTGTGTATCAACGA 

15 GACAATAATATGCTCTAGAAATAGAGTCATATAAATCTTGTGTCACCACTT 
TATTTCTCTCAGTAAAAATAGAAAAACATAAAGACGATGCATCAATAGTGC 
GTGCCATCATTGAATTATGACGTTCATCAGATGCCACTATAACAGAATCAT 
CTTCATTCAATTCTCTTTCTACATATGATTGGAATTTCTCTTCTATCAACT 
CAGCCATAGATGAATACGTAACCCTTTTAAATTTTGATTGGAAATTTTTAT 

20 TAATCGTTACTGTATTAACATTTAAGTCTTCAACAAATATTTCTTCCCCAT 
CTTTTGAAAAATAGTGTTTTTTATTGTTATCACCATCAGGTGTATAAGTAC 
GCACTGCCGATATAAATCCTCTGTCATCGAATACAAATCGTCTTTGAATTG 
TACTATACTTGTAATCCTCTACCCACATCAGGTACCCCTCTTGGCTGAAGT 
GAATTTTAGAAAACGTGTTATCACCTGTAATCGCTTGAATTTGAAAGGGAG 

25 TAAAAATAAATTCAGTGCCTTCTGGCCATGAAAGATCGCGATAATCAATAG 
CTTGAGGCGTTTGATGTCCAACGCCCTGTATCTCATCAAATACAGACCAAT 
AATGACTTTCATACAAATCATATCGATGGAGGAATGTTCTAAGATATGGAC 
TAAAATTTAAAACTATCAATTGATAATCCACATTATTCGAACTATGCATTG 
TCATTAAACTAATCATATCGTCAAAATCTGTATACTGTTTTTTTAGATAGA 

30 ATGGTCTTGAGGTACTTTCCCACCATCTGTTACGGCTATACCAAGCTGGTA 
TAAATCGTTTCATATTCCAAAACCTCCTCCCTACCAATATTTATTTAAAAT 
TTGCTTATAGCTGTCAAAGTATAAGTACGCTTTAATTACTATGGGAGGTTT 
AAAGCACGGAACTGTTCCAATATTTGAAAAAGGAACATCAAGCTATGTTGA 
CCCTATCCCCCAAGCATTGATTTTAACAGCTATCGTTATCGCCTTTGCTAC 

35 AAC AGCTTTCTTTTTAGTTC TTGC ATTTAGAAC ATATAAAG AACTAGGCAC 
TGATAACGTTGAGCTAATGAAAGGAGCGCCAGAAGATGATAGAGAGTAACT 
TGCTTGTTTTAACTTTAGTCATACCTATCCTAACTGCAATTTTACTCATAT 
TTATTGGAAAGCGCCCAATAATAAAAAGATATGTTGCACTTGTCGGTACAT 
TGCTAACATTGGTCATTGCATTGATTAACTTAAAGAATGTGCTACGAGATG 

40 GACCAATCAAGGTTGAACTGGGCTCTTGGAAAGCCCCATACAGCATTGTAT 
TTGTAGTCGATATTTTTAGCGCTCTACTCATCATCACAAGTATTATTATTA 
CCATATTAGTCATTTTATACTCATATCGTTCTATCGGTTTAGATAGAGAAC 
GTCATTATTATTATTTCTCTATCATGTTTATGTTGATTGGTATTATTGGCT 
CATTTACAACAGGAGATATTTTCAACTTGTTTGTGTTCTTTGAAGTCTTTT 

45 TAATGTCTTCATATTGTTTACTCGTTATTGGTACTACTAAAATACAATTAC 
AAGAAACAATTAAGTATATTTTAGTCAATGTTGTTTCATCGTCTTTCTTTG 
TCATGGGTGTTGCAGTTTTATATTCAGTTGTAGGAACTTTAAATCTCGCTC 
ATATTAGTGAAAGATTGTCACAACTTTCTGTACATGACAGTGGCTTAGTCA 
ATATTGTTTTTATTTTATTTATCTTTGTCTTTGCCACTAAAGCAGGCGTTT 

50 TTCCTATGTACGTATGGCTACCTGGTGCTTATTATGCCCCTCCAGTAGCGA 
TCATCACGTTCTTTGGTGCACTATTGACTAAAGTGGGTGTATACGCAATTG 
CGAGAACTCTAAGTTTATTCTTTAATAATACAGTAAGCTTTTCTCATTATG 
TCATCCTTTTCTTAGCATTACTTACAATTATTTTTGGATGTATAGGTGCGA 
TAGCTTACTATGATACGAAGAAAATCATCCTTTACAATATTATGATTGCAG 

55 TAGGTGTCATATTAGTTGGTATTGCTATGATGAACGAATCAGGCATGACTG 
GTGCAATATATTACACACTACATGATATGTTAGTTAAAGCTTCATTGTTCT 
TACTCATTGGCGTCATGTACAAAATCACTAAAACGACTGACTTACGTCATT 
TTGGTGGCTTGATAAAAGGGTATCCTATTCTAGGTTGGACATTCTTTATTG 
CAGCGCTAAGCTTAGCGGGTATACCACCTTTTAGTGGTTTCTACGGTAAAT 



896 



wo OJ/34809 



PCT/USOO/30782 



TCTATATTGTTCGAGCGACCTTTGAAAAAGGATTTTATCTAAGTGGTATCA 
TTGTACTTTTATCAAGTTTAATCGTGTTATATTCAGTCATACGTATl^ 
TAAAAGGATTTTTCGGTGAAGTTGAAGGATATACTTTATCTAAAAAGGTAA 
ATGTTAAATATCTAACAACTATCGCTGTTGCATCTACAGT 

5 

Sequence 33 87 

step. 1002d07 .cons. ok 

CAAAATACTTTAATATTTCAGCTGCAACATGATAATCTCTCAAATCCTCAT 

10 CAAAACCAAGAGCAAGATTTGCAGTAACTGTATCATAACCTTTTTCTATCA 
ACTCATATGCGCGCAACTTATTAATTAACCCTATTCCTCTACCTTCTTGAG 
GTAAATAAATAATCATTCCACCATGTTCGTCAATATATTTCATTGACGCTT 
CAAGTTGTGCCCCGCAATCACATCTTTGACTATGAAAAATATCCCCAGTCA 
GACAAGCAG7UVTGCATACGTACATTAGGATTGCTTTTTAAATCTCCTTTAA 

1 5 CAATAGCTACGATTTCTTCATCGCTATAATCCGTTGTAAATCC ATACATAT 
CAAAATGACCAAAATCAGTTGGCATCTTGACCTTTGCCTTAAGATTAACAT 
TAAGTTCAACAGCCTTACGAAAAGCAACCAAACTTTTTATA6TAATCATTT 
TTAAATGGTGGCGTTCTTTAAATGACTGGAGATCTTCACCCTTAGCCATTG 
TCCCATCATCATTCATAATTTCGCAGATTACTCCAGCTGGTTGTGCTCCTG 

20 TTAACCGTGCCAAATCTACGGCAGCTTCAGTATGACCATTACGTGTTAACA 
CACCATTCTCTTTTGCTATAAGTGGAAATAAGTGCCCCGGACGATGAAAAT 
CTTCAGGATTAGTATTTTCATCTATGAGTGCTCTAGCCGTTTGTGTACGTT 
CATGTGCACTGATTCCTGTAGTAGTTTTATAATGATCAATGCTTACAGTAA 
AATGTGTGCCATAAATATCAGTGTTATTTTGCTCCATAGATTGTAGTTTTA 

25 ATCTTTCAGCTATAGATTTATCAATTGGTGCACAAATCAGACCACGACCCT 
CTTTAGCCATAAAATTAATGGTATTATCATCCATCCATTCCGTAACAGCTA 
CAAGATCTCCTTCATTTTCTCTATCTTCATCGTCAACTACAATAATGCTCT 
CTCCATTTCTTAAAGCCTCTATAGCCAACTCAATTGTATCGAATTGCATGT 
GATATTCCTCCTAAAAACCAAATGCTCTAAGCTTTTCTTCTGTTAACTGCG 

30 ATTGATTTTGATTCATGATGTTTTCAACATATTTGAATAGTACGTCAGACT 
CCAAGTGCACTTTGTCGCCCACTTTTTTAGATGAAAGAATAGTAGATCGAC 
GTGTTTCTGGTATAAGATGTATATCAAAAGTATAATCATGTAAATCAAATA 
CAGTAAGACTAACTCCGTCTACAGTTATAGAACCTTGCTTTACCATTTGAT 
TCAAAATATTTTTAGTTGTTTTAATAGAAATAATTTTTGAGTTAGCAGTTT 

35 CATTGATTTTAGAAATTGTTCCAAGCTCATCAACATGACCTAACACGAAAT 
GTCCACCAAATCTCCCACTTCCACTCATGGCTCTTTCGAGATTAACTTCTG 
TATTACGTTGAACACTTCCAAGATATGTTTTGTTTTCAGTCCCTTTGATGA 
CTTGAACTGAAAAACTTGAGTCAGTGAAATCTATCACAGTTAAACATGCAC 
CGTTAACACTTATTGAATCACCAATATGCATATCAACTAAAATGTTTTGTG 

40 CTTTAATTTCAAGCGTTCTTACTGATTGTTCAGAGCGAACTTGTTGTACAG 
TACCTATTTCTTCAATGATACCTGTAAACATAGACATTCACTTCTTTCGTA 
ATTTCAATTTTAAATTTTGATTAATTAACTTGGAATCAACAATTTCAAATT 
GAGTTGCTTCAGGCAAATCAATGACCTCGTCAGTCTTATAAAATTGATGTT 
TGCCAGAACCACCAATTAATTTCGGGGCTATATATAAAATGAGTTCATTTA 

45 GATGTTTGGATTGGAGAAATTGAGATGTAATATTTGGGCCTGCCTCGACTA 
GCAGTTTCCCAATACCTCTTTGATATAAGTCTTGTAATATTGTCGTTGTAT 
CACAATTACTAATATTTATTATTTTAATAAAACTTTTATTTGTTTTTAATT 
TTTCATTTTCAGTGTAAATCCATATCTCCGATGCAGTATCTTTAAATATTT 
GTTGATTAAAATCGAGTTGACCTTTCTTAGAAAGAATAACTCGAATCGGAT 

50 GCTTTCCATCAGGAACCCTGGTTGTATACAATGGATTGTCTGCTTCAATGG 
TTCTACGCCCAGTAATAACTGCATCATGCTCATGTCTTAATTGATAAACAT 
CTTCTTTAACTTCTTTGTTTGTTATCCACTTACTTTCATTAAAGTCTGTTG 
CTTGTTTACCATCTAGACTAGATGAGACCTTTACAGTTACTTCTGGAACTT 
CGTTTCTTTTAGCAGTAAAJUUVGTCACGGTATAATGCAGCTGCATTTTCAT 

55 TATATTGAAATTCAACCTCTATACCAGCTTCTCTCAGAATCTCGTCACCCT 
TACTTACTAAAGTAGTATCTTTAACAGCATAGATGACCTTAGATATGCCCG 
CTTCAATGATTTTATGCACACAAGGTGGTGTTGAACCATGGTGTGTGCAAG 
GTTCTAATGAAACGTATATGGTAGCACCTTGGGTATTTAAACCTGCCATTT 
CAATAGCTTGTACTTCGGCATGTTTATCTCCCTTTTTTAAATGTGCACCTA 
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AACCTACAATCCTACCGTTTTTAACAACAACGGATCCTACTGGTGGATTAA 
CACCTGTTTGTCCATTTACCATTTTTGCTAGTTGAATAGCATCATCCATAA 
ATCTACTCAATTGATCACCTCAAAAAAAACCTTATACCTATATTGTAGGCA 
TAAGGAGATATCTATATCTTTAAATAAGCCTATTAGCATACTTAAAATTGC 
5 ACATAAGTATACAATACGCTTATCATAACTACACGAATTTGTGTAAGTCGA 
TAATTCAACTTCTTTCTCCCATCCAGACTCTAACTGTCGGCTCTAGATTCA 
CACTAGATCAGCCACTAACATCAAAAACAATATTAGTGGGTCGCAGGCTCA 
ATTTACTGCCGGTTGGGAATTTCACCCTGCCCCGAAAGAAATATATGAAAT 
TGTTATGATTAATTGAGTCCAGTAGTTCTTCACTACATTGTAAATCTTACT 

1 0 ATATTTAAATTTAAAATACAATTTTATAGCTTACTTTATTAAAAG ATTTT A 
TGAGAATATATTAAACTCACAAACACGCTTAATGCATATTACTTATTTCTA 
TTTTCCTGTGTTATATATAATTGAGATACTGTATCAACAACTTGATTAACA 
ATCATCTTTACGCCATTTCTAAGTTTATTGACACCGTTGGTCATTTGTCCA 
ATTGCAATCACATTTTTTAAGGTTCCATATCTGGGGCTTATGACTTGATTT 

1 5 GTTTCTGGGATAATTTGAATGCCACCCATAGGATGACGTTGAACAATTTGT 
CTATTTTCTAAGTTTAAAATTAATTGATCATCCTCATCTAATTGAGAAAGA 
TGTGTTTTAGAGCCCGTTGCATTAATAACGATGTCAAACAACTCATAATTT 
TGCGTAGTGTCGTTATACTTCAACATTAAGAGGAATCCTGCACAAAAATGT 
CTCCATTCGGTAGCCATTTTTGTTAAAAACATTGATTTGACAGTGTCAAAT 

20 AGAATTCCTTTTTATGCGCTGAAATTTGTTATATGGAGACTGAATGTA 



Sequence 3388 

step . 1002d0 8 . cons . ok 

25 GATTTAGAACAGAATGTTACAGTTACACAAGTTAATGATGCATTTAAAAAT 
GCCGATTTATCAGGTGTTCTTGATGTTGAAGAAGCTCCTTTAGTTTCTGTA 
GACTTTAACACAAATCCTCATTCAGCAATTATTGATTCTCAATCTACGATG 
GTTATGGGACAAAATAAGGTGAAAGTTATCGCTTGGTATGATAATGAATGG 
GGTTATTCGAATAGAGTTGTTGAAGTAGCTGACAAAATTGGACAATTAATT 

30 GATGATAAAGCAATGGTAAAAGCCATTTAATTAAATATTTATACTTACACT 
AAACCTGAGATAGCCATATTATCTCAGGTTTATTTATATACAAATAGCCCT 
AAACAAAGTTGAAGCACCATTGATGTTATTATAGATAACCAATTAGCTTTA 
GACATATTTCTAGATACTTTCTTTATAAAAAAATGATCCATAGTCATTATT 
TATTAATCAAGTTGAGATGATTTTGAAATGAGGTGATTTTAACGAATCATC 

35 TTTCAAATTATAATTTAATAATGGTACTTTTGGTCGTATTTATAAAATTGT 
GTAGTACATTTAAATGGTTGAAATTCATTTAATATGGTTAATCACGTTAAA 
ATGTATTATTTTTAATAAATAACATAGCATTCTAAAACCTTTATTTATAAG 
ACAATTTTTCCTTTTTTACACATAGTGGTATTAACAATGTATCATTACATT 
CTTTCGTTTATAGTTGCAATTGACCTTTCTTTTTTACACAGTACTTTTATA 

40 GTATAATGTACGGTGATTAAACTTGGGAGGGTGACATATATGAAATGCCCA 
AAATGTAATTCTACACATTCCAGAGTGGTTGATTCAAGACATGCAGATGAG 
GCCAATGCGATTAGACGTAGAAGAGAATGTGAAAATTGCGGAACGCGTTTT 
ACAACATTTGAACATATTGAAGTTAGTCCATTAATAGTAGTGAAGAAAGAT 
GGGACTAGAGAACAATTTTTAAGAGAAAAAATATTAAATGGTTTAGTAAGA 

45 TCTTGCGAGAAACGACCAGTACGTTATCAAC7U\CTTGAAGACATAACTAAT 
AAAGTGGAGTGGCAACTTAGAGATGAGGGACAAACTGAAATTTCTTCTAGA 
GAAATTGGAGAGCATGTTATGT^TTTGTTAATGCATGTTGACCAAGTTTCC 
TATGTAAGATTTGCATCTGTATATAAAGAATTCAAAGATGTTGATCAACTC 
TTAGAGTCAATGCAAGGTATCTTGAGTGATAATAAACGGAGTGATAAATAG 

50 ATGGGGTTACAAACCTATGAATATGGTCTAAAACCACAAGATGGATTTGAG 
GTGATTACACATTTCGAATTCACCTCACAACATTTAGATATTTTAAATCGA 
CTATTCACCCCTTTAATCGGAGTTGAATCAATTGGACTCTATCATTTTATG 
AGTCAATTCATAGATAAAAGTCAACAACTCGGGTTAACGCATTATATATTC 
ATGAATGAACTAAAAATTAACTTATTAGATTTCAGGGAGCAAATGGACAAT 

55 TTAGAGGCTATTGGATTGATTAAAACATTTGTAAGGCATGAAGAAAAGTAC 
TCTCACTTTGTTTATGAGTTAATTCAGCCTCCAACAGCCTATCAATTTTTT 
AATGATCCTATGTTATCAGTATTTTTATTTAGTGAGGTTGATAAAAAACGT 
TATCAAGCACTTAAATCTTATTTCGAAAAAGATGAGAAAGATTTAAGCAAA 
TATCAACAGACAACTAGAAAATTTACAGAAGTATTCAACGTACCTAAAAAG 
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GTCAATGTTTCTGATCAAATTAATTTAAAGCAAATCAAACACTATGATGGT 
ATAGATTTATCTT^TGAAACTTTTGATTTTGAAATGTTGAGACAGATGTTG 
AACCATCATTTTATTAGTAATGAAATTATCGATAAAGAAGCTAAGAATTTG 
ATTATACAACTTGCGACACTTTATGGAATTACTGAAGATGGTATGAAAAAT 
5 GTTATATTAAGTTCCATTACCAGTGCACAACAATTATCTTTTGAAGAAATG 
CGTAAGAAAGCTAGAACTTATTACCTGATTGAACATGATAATCAATTACCA 
AAATTAGAGCATCAAACAAATAAAATTAACGATGAAAAAAAAGATCGACAA 
GCGGAAGATACAACAAATGATTGGTTACAACTGCTAGATGAAACAAGTCCG 
ATTGATATGTTAGCAAGTTGGTCTGATTCGGAACCTACACAGTCGCAAAAA 

1 0 AGTATG ATAG AAGAATTGATTAACCGTGAAAAAATG AATTTTGGTGTAATC 
AATATACTTTTACAGTTTGTTATGTTAAAAGAAGATATGAAGTTGCCAAAA 
TCTTATATTTTTGAAATTGCTTCCAACTGGAAGAAAATTGGTATTTCAAAT 
GCCAAACAAGCATATGAATATGCATTACAAGTTAATCAACCTAAAAATTAC 
GAAACACATTCTAATGATAAACGACAGAACAATCGTGGAAGACAAAATCAA 

15 TTTTTATCCAAAGAAAAGACACCTAAATGGCTTCAAAATAGGGACGATCAA 
GAAGAAAATAAAGAAATAAATGATGACACTCTCGAAGAAGATCGACAAGCA 
TTTCTTGAAAAGTTAAATCAAAAGTGGAAGGAGGAAGATAACTAATGAAAA 
GTTTCAGAAATATCATGGGCGATTCTCAAAATCTAGATAAACGTATACAAA 
AAATAAAACAAAATGTAATCAATGATACTGACGTTAAACATTTTCTTGAGA 

20 AAAATCGTAGTAATATAACTAATGAGATGATAGACGAAGATTTAAATGTTC 
TTCAAGAGTATAAAGATCAACAAAAAGTTTATGATGGACATCGCTATGATG 
ATTGTCCGAATTTTGTAAAAGGACATGTTCCTGAACTATATATTGAAAATG 
AAAGAATCAAAATTAGATATCTACCTTGCCCGTGTAAAATTAAACATGATG 
AGGAACGATTTGATTCACAACTTATTACATCTCACCATATGCAAAGAGATA 

25 CACTTCATGCAAAGCTCAAAGATATTTATATGAATAATCGAGAGAGACTTG 
ATGTAGCAATGGCAGCTGATCAAATCTGTACAGCAATTACTAACGATGAAA 
AAGTAAAGGGGTTATATTTATATGGTCCTTTTGGTACAGGAAAATCATTCA 
TATTGGGTGCTATTGCAAATCAACTTAAATCGCAAAAGATTTCATCAACAA 
TTGTATATTTACCAGAATTTATTCGCACTTTAAAAGGTGGCTTTAAAGACG 

30 GTAGTTTTGAGAAAAAATTACAACGTGTGCGAGAAGCTAATATTTTGATGT 
TAGATGATATTGGCGCAGAAGAAGTCACACCGTGGGTAAGAGATGAAGTGA 
TTGGTCCTTTATTACA 



35 Sequence 3 389 

step . 1002d09 . cons . ok 

CAATGGCTTTGGGAAACTTCACTTTATCAAAAATGGTTGATTTCTTTTATT 
TTAATTTGCAATATATTTGGTACGATATACGGTTATCATTGGTATAGTAGT 
CAATTAGCTAACACACCAAATTATTTTAAATTATTTGTTCCAGATAGTCCC 

40 ACTGCTACATTATTTTTATGTTTATCATTATTATTCATACTATTTAATAAG 
CGTTATGCCCTGATTGACGCTTTAGCTTTTATAACTCTATTTAAATATGGT 
AGTTGGGCTGTAATTATGAATATTTTGATGTTTATAAAAATGAATGATATT 
GCCATAAACGGACTTATGTTGCTCATTTCTCACGCAATTATGATATTAGAA 
GCTATCTATTTTTATCCTCGTTTTAAAATATCTAAATTGGCTGGATTGATG 

45 AGTTTCATATGGGTGACGATCAATGACGTAATAGATTACATATACGGACAA 
TATCCCTACTATGATTTTATCGCCAAACATTTAATTGAAGTAGGGGTATTG 
GCTTATAGTCTCACTATCATTTCGTATATTTTATTTTTAAAATTAdAAAAG 
TGGTTGAAAGTTAAAACATTTGATTAAAGACAAATTTGACAGTAAAATGAA 
AATAAAGGAGAGTGAGTTTAATGTCTTTAATATCAATGATTATATATTTCG 

50 TAATATTAATGGTCATTCCAATGTGGGCTCAACACAAAGTTAAATCTAATT 
ATGAAAAATACTCACAAGTGAGATCCACAAGCGGTAAAACTGGTCGTGAAG 
TTGCTGAAGAAATTTTACATGCTAATGGCATTTACGATGTTGACGTCGTTA 
AAGGCGAAGGTTTCTTAACTGATCACTATGACCCAAATAAAAAAGTTGTTT 
GTTTATCTCCTGCAAACTACGATAGACCATCAGTTGCTGGTACTGCAATTG 

55 CAGCCCATGAAGTAGGACATGCAATTCAACATCAGCAAGGATACGCACCAT 
TAAGATTTAGAACAGCATTAGTACCATTAGCTAATATTGGTAGTTCTTTAA 
GCTACATTATTATTATGGTAGGTATTGTTTTAACAGCTCTCGGTAGTGTCT 
TTGGCTCTACAGTTCTATGGGTTGGTGCTGGTTTAATGTCACTTGCAGTTT 
TATTCTCTATCGTGACATTACCTGTTGAGTTTAACGCTAGTTCAAGAGCGA 
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TGAAACAAATAACAGCTTTAAATATCGTTAATGAGAAAGAATACAAACACG 
CGAAAAAAGTTTTATCTGCAGCAGCAATGACATATGTGGCATCTACTGCAG 
TTGCGTTAGCTGAACTTGCTCGTATTATATTAATTGCAAGATCAAGTGATT 
AATCATATATAACAACGCAGTTCCTGGTAATCATTACAATGGATTTGTAAT 
5 GGTTACCTTTTTTTATTTGAATACGTGACAAAATCCTATCTTTTAATCCAT 
ATTAAACATTGCACTTGAAATGCAAGTCACTTTATTTTGTAATATACTATA 
TAAGAATAGCGATAAAGGAGTACTTCTATATGAAATCTATGGAGCAAATGC 
AACAAGAAGTTGATGACTATATCAGTCAGTTTAAAGCTGGTTATTTTTCTC 
CACTTGCTAATCTAGCAAGAATGACAGAAGAAGTTGGAGAATTAGCACGCG 

1 0 AAATTAACC ATC ATTTTGGAGAGAAAAAAAAGAAAGATACAGAAGAAGATA 
ATACAATTAAAGCTGAATTGGGAGACAATTTATTTGTTTTACTGTGTATTG 
CAAACTCTCTAAACATAGATATGACAGAAAGCTTTAATGACACGATGAATA 
AGTTTAATACAAGAGATAAAAATCGCTTTGAAAGAAAATAATTTAAGAAAG 
GATGGGCACAAATGAAAATCGGTATAACGTGTTACCCGTCCATGGGTGGTT 

1 5 CTGGC ATTATTGCTACAGAACTCGGTATTAAAATGGC AG AACGTGGCC ATG 
AAGTTCACTTTATTACCTCAAACATACCCTTTAGAATACGCAAACCTTTAC 
CTAACATGACGTTCCACCAAGTTGAAGTCAATCAATATGCCGTATTCCAAT 
ATCCACCATACGATATTACATTAAGTACAAAAATTTCTGACGTTATACAAG 
AATATGATTTAGACATATTGCATATGCATTATGCTGTACCTCACGCTGTAT 

20 GTGGTATATTAGCGAAACAAATGTCAGGTAAAAACGTCAAAATTATGACAA 
CACTACATGGCACTGATATAACTGTGTTAGGTTATGACCATACTTTACAAA 
ACGCGATAAAATTTGGCATAGAACAAAGTGATATTGTAACAAGTGTTAGCC 
ATTCTCTAGCACAGCAAACTTATGAAATTATCAATACTAAAAAGGAAATCA 
TCCCTATATATAATTTTATAAGGGAAAATGAATTCCCAACTCGGCATAATG 

25 AAGAATTAAAAGATTGTTATGGTATTTCACCTGAAGAAAAGGTATTGATAC 
ATGTTTCTAATTTCAGAAAAGTAAAACGTATTGATACAGTGATTGAGACAT 
TTGCAAAAGTTCATGAGAGTATACCATCCAAGTTGATACTTTTAGGAGATG 
GTCCAGAATTAATCGATATGCGACATAAAGCACGAGAACTTGATGTTGAAA 
CACACGTACTCTTTTTAGGCAAACAAAATGACGTAAGCGCATTCTACCAAC 

30 TATCTGATTTAGTACTACTCTTAAGTGAGAAAGAAAGTTTTGGATTAACTC 
TCTTAGAAGCAATGAAAACAGGCGTCTTACCTATAGGGAGTCATGCAGGTG 
GTATTAAAGAGGTCATCAGACATGAAGAAACTGGATl^rATAGTAGATATAG 
GGGATAGTACACAAGCTGCAAAATATGCTATTAAACTTTTATCAAATCCAG 
AGTTATATCAAAAAATGCAATCACAAATGCTGAAAGATATTGAAGCAAGAT 

35 TTAGTTCAGATTTAATTACTGACCAATATGAAAACTATTATCGAAAGATGC 
TAGAACAAGGTGAGAACAACAATGAGTCATGAGATATTTGAACAAGCAAAA 
CCTATTTTAAAAAAGATTCAAAATAAAGGTTTTAAGGCTTATTT 



40 Sequence 3390 
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TTAAAGATTTGAAAGAACTTTCTGACGCAGATATCAAAAGTTTTGAAGAAC 
GTTTAGATAAAGTCGATAATCAATCAAGTATTGACCGTATTATAAATGATG 
CAAAAGATAAAAATAATCATTTAAAATCGACAGACTCTAGTGCCACATCAT 

45 CAAAAACTGAAGATGACGATACATCTGAAAAAGATAATGATGATATGACTA 
AAGACTTAGATAAAATACTGTCGGATTTAGATTCAATTGCTAAAAATGTTG 
ATAACCGTCAACAAGGTGAAAATAGTGCTTCTAAACCTAGTGACTCAACAA 
CCGATGAAAAAGATGATTCAAATAATAAAGTACACGATACAAATGCTAGTA 
CACGCAATGCAACTACTGATGATTCTGAAGAGTCGGTTATTGATAAATTAG 

50 ATAAAATCCAACAAGATTTTAAGTCTGACTCTAATAATAAGCTTTCTGAAC 
AAAGCGATCAGCAAGCATCACCATCTAATAAAAACGAAAATAACAAAGAAG 
AATCTAGTACGACAACAAATCAATCCGATAGTGATAGTAAAGACGATAAAA 
GTAATGATGGTCGTCGCTCAACATTAGAACGTATAGCATCAGATACTGATC 
AAATTAGGGATTCAAAAGATCAACATGTCACAGATGAAAAACAAGATATAC 

55 AAGCAATTACACGTTCACTACAAGGTAGTGATAAGATTGAAAAAGCACTTG 
CTAAGGTACAATCTGACAATCAATCACTAGATTCTAATTATATAAATAATA 
AATTAATGAATTTAAGATCACTAGATACAAAAGTAGAGGATAATAACACTT 
TATCTGATGATAAGAAACAAGCGCTTAAACAAGAAATTGATAAGACTAAGC 
AAAGTATTGACCGACAAAGAAATATTATTATAGATCAACTCAATGGTGCTA 
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GTAATAAAAAACAAGCAACCGAAGATATCTTAAATAGTGTTTTCAGCAAAA 
ATGAAGTAGAAGACATAATGAAACGTATTAAAACAAATGGCCGAAGTAATG 
AAGATATTGCTAATCAAATTGCCAAGCAAATTGATGGTCTTGCATTAACTT 
CTAGTGATGATATTTTAAAATCAATGTTAGATCAATCTAAAGATAAAGAAA 
5 GTTTAATTAAACAATTGTTGACGACACGACTTGGTAATGATGAAGCAGATC 
GTATTGCTAAAAAATTGTTAAGCCAAAACTTGTCGAATTCTCAAATCGTAG 
AACAATTAAAACGTCATTTCAATAGTCAAGGAACAGCTACAGCTGATGATA 
TATTGAATGGTGTGATTAATGATGCTAAAGACAAAAGACAAGCGATTGAAA 
CAATATTACAAACCCGTATCAATAAAGACAAAGCTAAAATTATCGCTGATG 

1 0 TTATTGCGCGTGTAC AAAAGGAC AAATC AGATATC ATGGATCTCATTCACT 
CTGCGATTGAAGGCAAGGCAAATGATTTATTAGATATAGAAAAACGAGCAA 
AACAAGCTAAGAAAGATTTAGAATATATTTTAGATCCTATAAAGAATAGAC 
CATCCTTGTTAGATCGTATTAACAAAGGTGTCGGTGATTCTAATTCAATAT 
TTGATAGACCAAGTTTACTTGATAAACTTCACTCAAGAGGATCTATTCTTG 

15 ATAAATTAGATCATTCGGCACCGGAGAATGGATTATCTTTAGATAATAAAG 
GTGGCCTTTTAAGTGATCTATTTGACGACGATGGTAATATCTCATTACCAG 
CGACAGGTGAAGTCATCAAACAACATTGGATACCAGTGGCTGTTGTACTCA 
TGTCATTAGGTGGAGCGCTCATCTTTATGGCGCGTAGAAAAAAACACCAAA 
ATTAATTTAAAAATTATATATAGTAATAGGCATCATCAATCTTATGTGTAG 

20 TTAAACTATTTCACAGTTAGTGATTGAGGTGGTGCAACGCTTACTCAAAGT 
TTATAACAAGAGTCAGTATAAAAGAGTAGGACTAAGAATTTAAAATCAATT 
CTGTCCTACTCTTTTTTGACATGCAATAATGATTGTTATATATAATGAATA 
ATATACTTTTAGAAATCACTTGAGTAAATTATAAAAATGGACTTTTCAATG 
TAAAGTTACTCTAATATAGTTATTGAATATAAATAAGCACAAAGATGTATT 

25 ATAAATTTGAATACATCTTTGTGCTATTATTTTGTGGATGGTACCTCAAAT 
AATTTTAAATTTTTACTATTTACCTTTAATGTAATCTAAATCTTGAGCATC 
TTCTGCATCAATAGTTCTGAATGAAATGACACCTAATCCTTTAACATTTGA 
TTCAGAAATAACAATTGAACCGTCGTCATTAACTTTTTCAACGAAAGCAAC 
ATGACCATACTGTGTATCAGCACCTAATTGTCCAGCTTCAAACACAACTGC 

30 AGTATGATTTTTAGGTGTGTGCGTTACCGTATAGCCTTCACTTTCAGCACG 
ATTATTCCAATTATGAGCATCACCTAAGTCACCAGAAATGGATGCATCAAA 
TTGATTCATACGGTGGTACACATACCAAGTACATTGGCCATGTGGGTAAGG 
TGATGTACCGGAAGTTTCAGTAAATGGTTTGAAGTCATTACCAGAGACATC 
CGTACCTATTGATTTGTTGTATTTCTTCATGTTAGGCATTTTTTCTTTGTC 

35 AAAAGATGTTAAATGATAATGTTTAATAATACTATTTAATTTTTTAGAATA 
ATTAGGATCTGTGGCGTATGAGCGTGACAGATGTGAAGTAGCATCTTTATA 
TGATAGAGCTTCACTCTTCCAAGTTGGTTTATAAATTGACGGATTACCATC 
GATACCATGTTTGATTAAATCTGCATAATCTTCAAGAGATTGTTTAGTACT 
TGGGTATTTACGGAAACCTGCTTGGATACTGAACATATGATTACTGCTATC 

40 AGCTTCTAAAGTATTAAAAGTTACAGATTGTCCTTTGTAGTCACCTTTGAT 
TCCAAACAAGTTATGATTTGGTGATTGTGCAAGTGAACTTTTTCCAGAGTC 
AGATTCTAAAATAGCTTGAGCAATCATAACTGATGCATATATATCTTGGTC 
TTTTCCAATCTGATGCGCATCTTTAGCAATTGATTGAATGAAATCACGTGT 
GTCTTGACTGTCAACAACATTAAATGAATCAGAAGATAAGTCACCATTGTC 

45 TAATTCAGGTAGTTGTTGGAAAAGACTTGTTGAACGTGTATTTGAGCGTTT 
AATGTCATCCTCAAATGATTGTGCAGGTTTTGATTGATGTTTTAATTC 
ATCTGTTGGTAATTGTGGATTCTTGTCGGCATTATCACTTTGAGATGTTTT 
AGTGTGACTTGTATTCTTGCTCTTATTGTAATCTTTTTCTGTTTTCTTAGC 
GTCCTGACTATATTCATCTAATATAGAGTCTAATGCTGAATCTGTAATACT 

50 AGAATATTTCTTGTTACGTTCTGAGCTATCATCATTCGCTTGAGTTTGTTG 
TTGCTCAGAAGAAGATTGATTTTTGTTTGTATCT 



Sequence 3391 
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TCATAGGGCTTAGATAATGCTTTATCCAAACCCACCTTTCCATCATTCTCT 
ATATCAACAGTATAATTTTCATGAGTTAACTCAAGCTCTATAAATCTAGCA 
AGATTTTGCTCATCTTCTACAATTAAAATATTTGTCATATTTGCACCTCAC 
GCTACATCTTAAACAATTAACTeGTATTTCTAAAGATGTAATTCAAAATTG 
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TAAATGATTTGTTATGAAACTAGTATCATTTTACCATGAACCACGATAATA 
GATAGTCGAATAAGTAGTCTGTATTTAGAAAAAGTCACTCTAGTATTATCA 

TTGTATTTAAGTTAAATGGAAAAAAATATATATTTCTAATTGACAATCATT 
ATCAATCATGTATAATGATAATTGAAAATGATTATCAATACCAATTGAAAA 
5 ACATTCCCCCCACATACAAGTTGTTCTTTTGGATTGGTCATTTTCAACTAT 
CCCCTTTTATATGCCCGTAAAAGACTAACGTTAAGAAATGACGTTTCAATA 
AAAGCAGTAGACCTTTGACACTTGAGGTCTGCTGTTTTTTCAATGATAATA 
TAATGAGAGAAAGTAAAATAATAAGTGACTAATAATCTTATCAAAATTTTT 
AGAACGCTCATCACTTTTGATTAATGAAATAATATATACAAAAGAGTCTGT 

1 0 AAC AC ATG AAGTGC AC AGACTCTTTCCTACG ATTAGTCAAGTACTT ATAAT 
TAAAATAAGCATTACTTAACTAAACCGTTCATTTTAATTTTTTTGAATAAG 
GTGTTAAACTATATTTGATTGTACGTTTGGCTAATGTGGTGACATTTGATA 
ATCGTACAATTTATGATTCATTACAAGATAATGAATTGTTTTTAATAATCG 
ATTTATACAAGCAATGATGGCAGTCTTATGAGGTTTCTCATTAGGCTGCTT 

1 5 TCTTAGTTTGTAGTAATAATCGACGAC ATGATTGTC ATAATGATGCTGCCC 
TCTTATTATATTCAT7UVTCACCCAAAATAAAAGTTTTCTCGCTTTTTTATT 
ACCACGCTTGTTGATGGTATCTCTACAGTGTGTATGACCTGATTGATATCG 
TTTGATATCAATGCCAACAAAAGCATTGAGTTGTTTATTTGATTTAAATCG 
CTTAATATCACCAATCTCCCCAATAATCATAGCTGTGCTTAGCTTACCAAT 

20 ACCAGGTATCGAATGAATATTTTCAAAATAATCGAGTTGTTGTGCTAATTG 
AATCATGGCATCATCTAATTGTTTGAGATGATGAATAGATTGTTTTAATTG 
TTGAATAAGTAAGCGTAATTTTTCGACTAGAAAGGAATGTCTATCGACATT 
AGGATAGCTTTCTTGAGCAATCACCCTTAATTGAAGTGCATATTTTGTAGC 
TTTATCCATTGACATTCCCTTATCTGTAGAATTGAATATATGTGTAATCAG 

25 TACCTCCTTGTCGATATCAAGAACCATGTCTGGATGAGTAAAGATTTCTGC 
GATGTTGAGTGCAATGATTGAATATCGACTACTAAATAATCTTTCTAAACC 
AGGGAATGTTTGATGGAGTAATTCAAGGATCTGAAATTTAAGTCGATTTTG 
TTCATTCTCGATTTCTAGATGAAAACGGACGCGTTCTCTTAATTCAAAGAA 
TATTAACTCATGTATAGGTAAGTTGTCTGTTTGTTTAAGCGTCGGTCCTAA 

30 ACAAGCAAGCTTATGAGCATCTGCCTGATCAGTTTTCCATGATCTTAGAGC 
GCTCGTTTTAAATTTGGCTTCTAACGGATTCATTTGAATATAGTTAATTTG 
ATTTACACAACAAAATCGTTCCATACTTCTTGAATAGATACCTGTAGATTC 
AAAAATGAGTTGTGGGTGGTCTAAGTCATTCAAATACTTGAGTAAATAGTT 
GTAACCATTTTTATTATTTTGGATGAAAAACTCTTTTTGGAATTTTTCATT 

35 TTTATAATGTGCAACTACACTACTTCTTTTACTAATATCAACACCTAAGTA 
ATCGATAAAAAAACCTCCTTTGAATAATTGAGAAGCTAAAAACTTTACTTA 
ACCTTTCTCATTTCATTTTCCTATACACGGTTTCAAGAACCCAACATACTA 
CAAACGAATTTCAAAAGGCGAGAGTAAAGCTGACTTGTTTTTTATACGGAT 
TTAAAATCCAAGAGTCTGGACAGTCTACTTCTCTCTATAACTATAAAAAAT 

40 AGCTATGAAAAAATCTATCGTCATAGATTTCTTCATAGCTAATCTTAGTAT 
GTTTTTATTTTCATAATTAATTGTTATGTTGCTAATTTTGGATGGTTTTCA 
ATCTAATCAGCTTCGTTAGTCACATAATATTGAATGAACTAGGTTCAAACT 
AAAAGTAGTTGAGTGTTATCAAAAGAAAAAAGAAAGACCCTACTTATAATG 
TAAATATCAAAGGGACTGAAAAAGCTAAAGATGAGAAATTTAAGTTTATTA 

45 CGATAAAAGACTATTTAAACTTGCTAGATTATTTCAAGAAAAGAGATGAAG 
AAAGTTATGTTTTGCTATATCTATTAGGCATTACTGGCGCAAGATATAGCG 
ATGTCATTTATATGACTTACAAAGATCTAAACAAAGCGAATGGCATAATTC 
ATTTGCCTGGAACGAAAACAAAGAATTCAAAACGTGATGTAGAAGTTAATT 
CAAAAGATATCATGCACATAAATTCAAAATTAGCTAAAATGCCGCGTAGAA 

50 TTGATGGCAAGTTATTCTCGGTTAGTCATACATCAGTAAGTAAAGCGTTTA 
GAAAAGCAAAAGAAGTGATAGGATTAAACGATAATAATATAACTCCCTATT 
CACTCA6ACATACGCACACATCTTACTTACTATCTAAAGGCATACCAATCG 
AGTATATAAGTAAACGTTTAGGTCACGCTACTATATCACAAACGTTAGACA 
CGTATTCACATTTATTAGAAGAACATAAAAAAGAGCAAGGCCAACGTGTCA 

55 GAGAAATATTCTCTTGACACTTATTTGACACTTACTCTCTCGAAAGCCCGT 
CATATCAACGGTATAGTACGGAGAGTTTCATTACTTCTTAATGGTGGATTC 
ATTTTGTAGTTTGAATCATCGCCAGGAACATCATTTTCAGTTAATAATAAA 
TGATGTGGTGTAACTTCTGCTGTTACATGGATACCAGCTTTTTTAGCGTCT 
CTGAT 
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5 TCAAGATTGGAAGTTACGTCATGTTCTAATTGAAATGCATGGTAATAATGG 
TAGTATCGATAACGATCCTCCAGCTGCTATGCGTTACACAGAAGCTAAACT 
TAGTCAATTATCAGAAGAACTATTAAGGGATATTAATAAGGAAACAGTATC 
ATTTATTCCAAACTATGATGACACAACTTTGGAACCAATGGTATTACCAGC 
GAGATTTCCTAATTTATTAATTAATGGATCTACGGGGATTTCTTCAGGATA 

10 TGCTACTGATATCCCGCCGCATAACCTCGCCGAAGTAATACAAGGCACATT 
GAAGTATATCGATCAACCTGATATTACAATTAATCAACTGATGAAATATAT 
CAAAGGGCCTGACTTTCCTACAGGTGGTATCATTCAAGGAATAGAAGGTAT 
AAAAAAAGCGTATGAGACCGGTAAAGGAAAGGTTGTCGTGCGTTCACGAGT 
AGATGAAGAGCCTTTAAGAAGTGGACGTAAACAATTAATTGTGACTGAAAT 

1 5 TCCGTATGAAGTGAATAAAAGTAGTTTAGTTAAAAGAATTGACX5AATTACG 
TGCCGATAAAAAGGTTGATGGTATTGTAGAAGTTCGAGATGAGACTGATAG 
AACTGGATTACGAATTGCAATCGAATTAAAA7VAAGATGCTAATAGCGAATC 
AATCAAAAACTATTTATATAAGAATTCGGATTTACAAATTTCATATAATTT 
TAATATGGTTGCTATTAGTGAAGGTCGCCCTAAGTTGATGGGATTACGTGA 

20 AATTATAGAAAGTTATTTAAATCATCAAATTGAAGTGGTTACAAATAGAAC 
GCGTTATGACTTAGAGCAAGCTGAAAAACGTATGCATATTGTGGAAGGATT 
AATGAAAGCTTTATCTATACTTGATGAAGTTATTGCATTGATACGTAATTC 
TAAAAATAAAAAAGATGCTAAAGATAATTTAGTTGCAGAGTATGACTTTAC 
TGAAGCTCAAGCAGAAGCTATTGTCATGTTACAGCTGTATAGATTAACAAA 

25 TACTGACATTGAAGCTTTGAAAAAAGAACATGAAGAGTTAGAAGCTTTAAT 
AAAAGAATTAAGAAATATCTTAGATAATCATGAGGCACTTTTAGCAGTAAT 
TAAAGATGAACTAAATGAAATTAAAAAGAAATTTAAAGTGGATCGACTATC 
TACAATCGAAGCTGAAATTTCCGAAATCAAAATTGATAAAGAAGTTATGGT 
GCCTAGTGAAGAAGTGATTTTAAGTTTGACGCAACATGGCTATATAAAACG 

30 TACATCTACACGTAGTTTTAACGCAAGTGGTGTGACTGAAATCGGTTTGAA 
GGACGGCGACCGTTTATTAAAACATGAAAGCGTGAATACTCAAGATACTGT 
TCTTGTATTTACAAATAAAGGTAGATATTTGTTTATACCTGTTCATAAATT 
AGCCGATATCCGTTGGAAAGAGCTTGGTCAACACATATCACAAATTGTGCC 
AATAGATGAAGATGAAGAAGTGGTAAATGTATACAACGAAAAAGATTTTAA 

35 AAATGAAGCCTTTTATATTATGGCTACAAAAAACGGCATGATTAAGAAAAG 
TAGTGCTTCACAATTTAAAACTACTCGGTTTAATAAACCACTCATAAATAT 
GAAGGTTAAAGACAAAGATGAACTTATTAATGTCGTTCGATTAGAGTCTGA 
TCAGTTAATTACTGTTCTAACCCATAAAGGCATGTCATTAACTTATTCAAC 
TAATGAATTATCGGATACAGGCTTAAGAGCAGCTGGTGTTAAATCAATTAA 

40 TCTTAAAGATGAAGACTATGTTGTTATGACAGAAGATGTGAACGACTCAGA 
TTCCATAATAATGGTTACACAACGTGGTGCTATGAAGCGTATTGATTTTAA 
TGTTCTTCAAGAAGCTAAACGCGCACAACGTGGAATTACTTTACTAAAAGA 
ATTAAAGAAAAAACCGCATCGAATTGTGGCAGGTGCAGTAGTTAAAGAAAA 
TCACACGAAATATATTGTATTCTCTCAACATCATGAAGAATATGGTAATAT 

45 CGATGATGTACACTTATCTGAACT^TATACTAATGGATCATTTATTATTGA 
TACTGATGATTTTGGAGAAGTAGAAAGTATGATTCTAGAGTAAAAGTATAT 
GCAATCACAAAATAAAATGATAAAATAAAAATTAATAATAGCAACTAAAAT 
GAGCACAAGGGATTTATGTCTAATTTGAACTATCAAAGTGATAATATACAT 
TTATTGTTAATATGAATGTTTAATCTTTGTTAATGAAATTAAAGCATAAAT 

50 CTCTTGGTGTATTTAAAACGATAAATTTAAGTGGATGAGAGGGATTCGTTT 
TGCAAGATTTTGATAACTTAATTCCTGGCTGGTTTAAAACATTTGTTCAAG 
TCGGGAATGACTTAATTTGGTCTCAATATCTTATTGGATTATTATTAACAG 
CAGGTTTTTTCTTTACAATTAGTTCTAAATTTATTCAACTCAGAATGTTAC 
CAGAGATGTTTAGAGCATTAACTGAAAAGCCAGAAACTTTAAGTAGTGGTG 

55 AGAAGGGTATTTCACCATTTCAAGCTTTTGCGATTAGTGCTGGGTCAAGAG 
TAGGAACTGGAAATATTGCCGGTGTTGCAACTGCTATTGTTCTTGGTGGCC 
CCGGTGCAGTCTTCTGGATGTGGATTATTGCTTTTATTGGTGCAGCTAGTG 
CATTTATGGAAGCAACGCTTGCTCAAGTTTATAAGGTACATGACAAAGAAG 
GTGGATTCCGTGGCGGACCAGCCTATTACATAACAAAAGGGCTAAACCAAA 
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AATGGCTTGGAATTGTATTTGCTGTTTTAATTACAGTTACATTTGCTTTTG 
TATTTAATACTGTTCAAGCGAATACAATTGCTGAATCATTAAATACACAAT 

ACAATATTAGCCCGGTAATTACTGGAATAGTACTTGCAGTTATTACAGGTA 
TTATCATCTTTGGTGGTGTTCGTAGCATAGCTACACTATCTTCACTTATTG 
5 TGCCTATTATGGCTATTGTTTATATAGGTATGGTTTTAATCATTTTATTAC 
TCAATATAGATCAAATTGTACCTATGATTGGCACTATTATTAAAAGTGCAT 
TCGGAGTTCAGCAGGTTACTGGTGGTGCTGTAGGAGCTGCTATTCTTCAAG 
GTATTAAACGTGGTTTATTCTCAAACGAAGCTGGTATGGGATCTGCACCTA 
ATGCTGCTGCTACATCTGCTGTGCCCCATCCCGTTAAACAAGGTTTAATTC 

1 0 AATCATTAGGTGTATTCTTTG AC ACTATGCTTGTTTGTAC AGCTACAGCAA 
TTATGATTTTATTATATTCTGGTTTGCAATTTGGTGATAGCGCGCCTCAAG 
GTGTAGCAGTTACGCAATCAGCGTTGAACGAACATTTAGGTTCAGCAGGAG 
GTATTTTCTTAACTGTAGCAGTTACCTTATTTGCATTTTCATCTGTTGTAG 
GTAACTATTACTATGGACAATCCAATATTGAATTTTTATCTAACAATAAGA 

1 5 TGATATTATTTATTTTTAGATGTTTTGTAGTACTTTTAGTATTTGTAGGTG 
CTGTTGCTAAAACAGAAACAGTTTGGAGTACTGCCGATTTATTTATGGGTC 
TTATGGCAATAGTAAATATCATATCAATTATAGGTTTGTCGAATATTGCGT 
TTGCAGTGATGAAAGATTATCAAAGACAG 

20 

Sequence 3393 

step. 1002e02 .cons -Ok 

TGGAGCTGGTAATTTATATATGACGCCACATGATATGGGCAAATTAATTTA 
TACGTTACAACAAAATAAAATCTTTAATGCACGTCAAACTCGACCTATTTT 

25 ACATGAATTTGGAACTCAAGAATATCCAGAAGAATATAGATATGGTTTTTA 
CATAACTCCGTATTTAAATAGAGTCAACGGGGTATTCTTTGGTCAAATTTT 
TACTGTTTACTTTAATGATCGGTATATTGTCATTTTAGGGACGAATGTAAG 
TAATACACCTGGATTAGTGAGTAATGAAGACAAAATGAGACACATTTTCTA 
TAATATTCTTGACCAGAAAAAGCCTTATAATACAGCAGGTGTTAAAGTTGA 

30 GTAATTGTAATTTTGTTAATAATGGATTAGACTCGTTATTCCATATTTACA 
AGGTTAGTCTTCAATAAATTATATATAACGAATGCCCGGAACTTCAATCTG 
AAGTGTAACCGGGCATTTGTTATATGATTAATAAAATAACATTCAAATTGC 
AATTTAATTATTTACAATAAGAAAGGGCGTTTAAGAATAAATCCTCATACT 
GTCTTTCAACAGGACTTAAAACAACTGTACAATTGGGATGCTCCATATTAA 

35 AGTCTACCACTGTGGCTCCCCTTGTAAAGTTGCCATTTAATTCTATTTGAG 
TATACGCCTCCTTGACATTAAAGGCTTCTGGATCCAAC7VAATATAATATAG 
TGAATACATCGTATAATTTAAATCCTATTTCAAAATCTTCACTCTTATAGT 
GCTGAAATAAATTATATAACATGTTTGAAGTTGCATTTGTATCTTTGAAGT 
CTTTTACAAAATGGTGAGTAAACAATGCTTCACGAGCCAAATCAAGACCAA 

40 TCATAGTCAAAGGTAATCCAGAGTTAAATACAATTTGAGCAGCTTCTGGAT 
CACAATATATATTAAATTCAGCTAAAGGCGTTACATTACCTCTACCGGTAC 
TACCACCCATTAAAACAATTTCCTTAATAAATGGTTGAACTTCGGGATAAC 
TAGTTAAAAGAATAGCGATATTTGTTAGTGGACCTATGGCAATCAAGGTTA 
AAGGTTCTTGAGTATTTACTAATAGATTTCTCATTGCTTCAACTGCATGAA 

45 TTGATGTTAAATCATCTTGATTTATTTGTGGAAACTCGTAACCATCCATAC 
CAGACTCACCATGAATTGATGTAGCTTCAAAGATGTCATTAATCAATGGTT 
GGGATGCCCCTCTATGTACAGGAACAGAACTATTAAAAAACCTTTTTAGCT 
TTAATGCATTTGCTGTCGTTTTCTCAATACCTACATTACCATTCACAGTTG 
ATATCATTTTAACGTCAAATTGAGGGTGTGAAAGTGCGATACTAATTGCTG 

50 TAGCATCATCTATACCAGGATCAGTATCAATTATAATGGGAATAGACATAT 
CATCATCTCCGTATTTCATAATAATTAAAGCATAACATGGTATGGGATTGT 
TACGATAACGAAATAAAAAGAGCCTGAGACTTTTAAGTCTCAGGCTCTTTA 
CATATAAGGTGGAAATGAATCAATAAAGAACAAATATATTCAAATTCATTC 
TTTATTGAAATCAACCCTGGAAAACTCATCTTTATAAGTGAGAAGAGTGTC 

55 CACCTTGCATTACCCAGTATGTTCCGATAACAGTTACTAAAGTAATGATAA 
TTGCAAA6ATAACTTTGAACGATTGTAAACGTCCATCTTTACCTTCAGTTA 
AATGCATGAACATTAATAATTGAAGGGCAGCTTGAATGAAAGCAAAACCAA 
AGATGATAGTTACCTTAGCATGGAATGTCATATTAGTGTATAGAGTTACAA 
AAACTGCTAAAAGCGTTAGTACGATAGAAGCAATAAAACCTACTGTATGTT 
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TTACGATTGTATTCATCCGCTATACACCATCCCTATCATATATACGGCAGT 
AAAGATGAAGACCCAAACAACATCTAAGAAGTGCCAGTATAAACTTACTAT 
AAATAATTTAGGAGCATTGTATGAATCCAAACCACGAGTGCCGATTTGAAT 
TAACAAACAAATAACCCAAACAATACCTAATGATACGTGTGCACCGTGCGT 
5 ACCTAGTAGTATAAAGAAACTAGACCAGAAGGAGCCAATAGTTGGGTTAAC 
ACCTTCAGAAGCATAGTGTGCGAATTCGTAAATTTCGAAACCTACGAATAC 
AAGACCrrAGGATAACTGTGATAATCATCCAAAACATCATTAAGTTTTGTTT 
TTCTTGTCGCATGTAAT/^TTGCAATACCACAAGTATAAGAACTAATTAA 
TAATGCAAAAGTCATTATTAAAATCAAATGTAATTCGAATAAGTCGGTAGT 

1 0 TAATTTACCGCC ATATCCGCC AGO ATGTTGTAACGTTAATAACGTTGC AAA 
TAGGGTACCGAATAACGCAAATTCAGCTGTAAGGAAAATCCAAAAGCCAAG 
TTTATTTAATTCGCCTTCGTGTGTACGAGAATCAATAGTATTTGCATCATG 
ACTCATGACTTACAGCCTCCCTTTCTTTAATTCGAGCTTCTCTTAAACGAG 
CTTCAGTTTCTGCAACTTCTGAAGCAGGGATGTGGTAACCATGATCAATTT 

15 GGAAACTTCTCCAAATCATAGTAATGAAGATACCTGCTAAACAGATAAGTG 
CTGGAACAATAGATTCGAAGATTAAGAAGAAACCACCAATAGTCATAAATA 
TACCCATCCAGAATCCTACTGGAGTATTGTTTGGCATATGAATATCTTTGT 
AGTTATGGTTGTCTAAATAATGACGACCATGTTCTTTCATATCAACGAATG 
TATCGTAGTCATTCCAATCAGGAGTGATAGCAAAGTTGTATTTAGGTGGAA 

20 TAGCTGATGCTGTAGACCATTCTAAAGTACGACCAAGTCCATCCCAGTTAT 
CTCCAGTAGCTTCACGTGGAGCTTTGATATGACTATAAACGATACTTGCAA 
CTAGGAATAAGAATCCAATTGCCATCAATACTGCACCGATAGTTGAGATGA 
AGTTTAGTAACCACCAACCATCAGAAGGCATGTAAGTGTATAGACGACGTG 
GCATACCATCTAAACCTAGAATGAATTGTGGTAAGAAACAAACGTTAAATC 

25 CGATCATGAAGAACCAGAAGCACCATTTGTTTAATGTTTCATTTAACTTGT 
AGCCCATCATTTTTGGATACCAGAAGATTAAACCAGCTAAGCAGGCAAATA 
CTACACCAGTAACCAATGTATAGTGGAAGTGAGCTACTAAGAAATAAGTGT 
TGTGATATTGATAGTCAGCTGATGCCATTGCAAGCATTACACCAGTAACCC 
CTCCTAATAAGAAGTTAGGGATGAATGCTAATGAGAATAGCATAGGTGACT 

30 CAAATGTAATTCTACCTTTGTATAATGTGAGCAACCAGTTAAATAGTTTAA 
CTCCGGTTGGAACACCGATTAACATTGTTGAGATAGAGAAGAATGAGTTAA 
TTAACGCACCATTACCCATAGTGAAGAAATGGTGAACCCAAACTAAGAAAC 
TTAAGAATGCGATACCTGCAGTTGCCCAAATCATACTTTGATGACCGAATA 
AACGTTTACGGGCAAAAGTAGGGATGATTTCTGAGTACATACCGAATGCTG 

35 GCAAAATAACGATATAAA 
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40 GCATTAGCACTTAAAATGGATAACAAGCCAAATATTGCTACCGCAACAGTT 
GGAGAAGGCAGTTCAAATCAAGGTGACTTTCACGAAGGTATGAACTTTGCT 
GCAGTTCACAAATTACCTTTCGTCTGTGTAATAATTAACAATAAATATGCG 
ATATCTGTACCAGATTCACTACAATATGCTGCTGAAAAGTTATCAGATCGT 
GCATTAGGTTACGGTATGCATGGAATACAGGTAGATGGAAATGACCCAATT 

45 GCAGTATACAAAGCGATGAAAGAAGCAAGAGAACGAGCGCTAGCAGGTGAA 
GGTCCAACATTGATAGAAGCTGTCACTTCACGTATGACACCACATTCATCT 
GATGATGATGATACATATCGTACAAAAGAAGAAAGAGACCTATTGAAACAA 
GAGGATTGTAATATAAAATTTAAAACGGCCTTACTCGATCAAGGCATCATA 
AACGAAAATTGGTTGAGTCAATTGGAAAAAGAGCATAAAGAACTCATTAAT 

50 GAAGCTACTAAATCTGCTGAAGCAGCACCATATCCTTCAGAAGAAGAAGCT 
TTGACATATGTTTATGAAGAGGGAGGTCAACGAAATGACTAAATTATCATA 
CTTAGAAGCTATACAAAATGCACAAGACTTAGCACTAAATCATTTTAGTAA 
TGCATTTATACTTGGCGAAGATGTAGGAAAAAAGGGTGGCGTTTTCGGCAC 
AACCAAAGGATTACAAAGTAAGTATGGTGATGAACGTGTAATTGATACACC 

55 TCTTGCGGAATCGAATATTATTGGTACTGCTATTGGTGCAGCAATGCTAGG 
TAAGAGACCTATTGCCGAAATACAATTTGCAGATTTCATTTTGCCTGCTAC 
AAATCAAATTATAAGTGAAGCAGCTAAAATGAGATACCGTTCAAATAATGA 
CTGGAATTGCCCACTGACTATCAGAGCACCTTTTGGTGGTGGAGTTCATGG 
TGGATTATATCATTCACAAAGTGTTGAAAGCATTTTTGCTTCAACTCCTGG 
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ATTAACTATTGTTATTCCTTCATCACCTTATGATGCTAAAGGTCTTCTATT 
GTCCTCTATAGAGTCTAACGATCCAGTCTTATACTTTGAACATAAAAAAGC 
ATATCGTTTTCTAAAAGAGGAAGTTCCCGAAACATATTATACTGTACCTCT 
AGGTAAAGCAGATGTTAAAAGGCCAGGCGAGGACATCACTGTATTTTGTTA 
5 CGGATTGATGGTGAATTACTGTTTACAAGCTGCAGATATTTTGGCAAATGA 
CGGCATCGATGTTGAAGTAGTCGACTTAAGAACAGTTTATCCACTAGATAA 
AGCAACTATCATTGAACGCTCTCAACGTACTGGTAAAGTTCTTCTTGTTAC 
TGAAGATAATCTAGAGGGAAGCATTATGTCTGAAGTATCTGCAATTATAGC 
TGAAAACTGTCTGTTCGATTTAGATGCGCCAATCATGCGATTAGCTGCACC 

10 GGATGTCCCATCTATGCCATTTTCACCAACATTAGAAAATGAAATTATGAT 
GAACCCAGAAAAGATACAGGACAAAATGCGTGAACTCGCACAATTTTAAGG 
AGGGTGTCAATTGGATATAAAAATGCCTAAGCTTGGTGAAAGTGTGCATGA 
AGGTACGATTGAACAATGGTTAGTATCAGTAGGAGATCATGTAGATGAGTA 
TGAACCATTATGTGAAGTTATTACAGATAAAGTAACAGCTGAAGTGCCTTC 

15 AACAATTTCTGGAACAATAACAGAATTAGTGGTTGAAGAAGGACAAACTGT 
CAATATTAACACGGTGATTTGTAAAATCGATTCGGAAAATGGTCAAAATCA 
AACAGAATCGGCAAATGAGTTTAAGGAAGAACAAAATCAGCATTCTCAATC 
AAATATAAACGTGTCACAATTCGAAAATAATCCTAAAACTCATGAAAGTGA 
GGTGCATACAGCCTCTAGTCGCGCAAATAACAATGGACGATTTTCACCAGT 

20 TGTCTTTAAATTAGCTTCTGAACATGATATTGATTTAACACAAGTCAAAGG 
AACTGGTTTTGAAGGTCGTGTTACTAAGAAAGATATTCAAAATATTATTAA 
CAATCCAAACGATCAAGAAAAAGAGAAAGAATTTAAACAAACAGATAAAAA 
AGATCATTCAACGAACCATTGTGACTTTTTACATCAATCCTCAACTAAAAA 
CGAACACTCACCATTATCAAATGAACGTGTCGTACCAGTTAAAGGTATTAG 

25 AAAAGCTATCGCACAAAATATGGTTACTAGTGTCAGCGAAATACCACACGG 
TTGGATGATGGTTGAAGCTGATGCAACGAATTTGGTTCAGACTAGAAACTA 
TCATAAAGCTCAATTTAAACAGAATGAGGGTTACAATTTAACTTTCTTTGC 
GTTTTTTGTAAAAGCTGTTGC AGAGGC TTTAAAAGTAAATCCATTAC TC AA 
TAGTACATGGCAAGGAGATGAAATTGTTATCCACAAAGATATTAATATCTC 

30 TATTGCTGTTGCAGACGATGATAAGTTGTATGTGCCAGTCATTAAAAATGC 
AGATGAAAAATCAATTAAAGGTATCGCGCGTGAAATCAATGATTTAGCTAC 
TAAAGCAAGATTAGGAAAATTAGCACAAAGTGATATGCAAAACGGTACATT 
TACGGTTAATAATACTGGTTCTTTTGGTTCTGTTTCTTCAATGGGAATCAT 
TAATCATCCACAAGCTGCCATTTTACAAGTAGAATCAGTCGTTAAGAAACC 

35 TGTAGTTATAGATGATATGATTGCAATTAGAAATATGGTTAATTTGTGTAT 
TTCAATCGATCATCGTATTCTCGATGGTGTTCAAACGGGAAAATTTATGAA 
TCTTGTTAAGAAAAAAATAGAACAATATTCTATTGAAAACACTTCTATTTA 
TTAATTACAAACAATGATTACCAATTAAACTTCATATTAAAGTTGAACGCT 
TTAAGGAGGATAATTTCTTCTTAAAGCGTTCTTCAGTATTCGTTTTAAATT 

40 TCAACTAGAATTTTTCAATCATCATAAATTAATCCTGTTGAACAATGTGTT 
GTTAATTAGTAGAATAAAATAAGTAAAATTAACTTATTATGGAAGGTGACA 
AATCATGGATTTAAACTTTGATTTATATATGAATGATGTTGTTGAACAAGC 
ACGAAACGAAATTGAGCATGCTGGTTATCATCAATTAACTTCTGCTGAGGA 
TGTTGATCAAGTTTTACAACAAAAGGGGACATCTTTAGTCATGGTAAATTC 

45 CGTA 



Sequence 33 95 

step. 1002e04 .cons .ok 

50 AGACCGTCTACAATATACAGTTTTTACTGTTGAAACACACACTACTTTTTT 
ACATGAATTAACACTTGGTCAAGAATTCAATATAGAGCTATACCTTTATAA 
TTACGATGACAAACGAACTCATTTCTTTCTGAGAATGTTAATTGATAATCA 
AGAAGTTGTTGCAACCAATGAGGTCATGATGTTAGGAATAGATAGGACACA 
ACGTCGTGCTGCACCATTTCCAAAACATTATCTGAATGCTATACAAGATTA 

55 TGCTCATAAGCAAGAAAAAATTGAATGGCCACCACAATTAGGCCACTCAAT 
TGGAATACCATATAAAGGAGAATAATAATGTCTATAAAAGAAAAATTAGTA 
ATAGAGCTCATGAATTTAAAAGACACTGTAAAATACGAATTGTCAGAATTA 
AATGAACAACAATATTTGTTTATGAACGGCCCAGCACCTCAGTTGATGAAG 
CGTGCCATGAAAATGAGCTATATTCTAGGCCAAAAAGAAGCGATTGATCAC 
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TTTCTATTACTCATCGACACTTGCCATGAGAATGTCTTACTAGATAACTGC 
CAAGACTATCAAAAACAAATTCAACATAAACAATTCTCTTTACAAGAAGAT 

ATTATTCACCAAATAAATCAGTCAAAGGAATTCGAATCATTTTTATCAACT 
TATTATGTAAATAAAGGTAAATATGATATCACCAGCAAAATAAAAACATTA 
5 CTTGAATAAATACTATTGTTTAAATAAAATGATTATCTCATACATCACTAG 
CTGATAAGCATAGCATTGGTAATTGACTTCATCCTTCCTTTTAGAACAACA 
TTCACATCACTTTTAAAATACAGTTAAAAATCTATTCAATTCTTACTCTAA 
GATTGATTTAGTGATGTGTTTTAACACCTTGTGAGATTGGCGTATGGGATA 
GATAGGAAAATCATGCACCATTTTAGGATACTCTTTAAACTCAACATCGTT 

1 0 CCCACATTCCTCTAGAGCATG AGTAAATGC ATGCATGTCTGGGCTTAATAT 
TTCTCTGCCACCTCCATACATATATATAGGTGGCAACCCTTTTAGAGTACC 
GTATAATGGTGATATACGCGCGTCTGATAAAGGTAAATCATTCGTCCAAGA 
TTTCATGAGTTGATGAACACCAAATCGACTAACTAAGATATCATTTTCTTC 
CAACGTTTTCGTAATATTAGGATTAGTCAAAGTGGCATCTAAAAGCGGTGA 

1 5 AAGTAAAAATAACTTTCTAGGCACTTCTTGATTATCATTGATAAGTGATTG 
AACAAAACTTAATGCTAGTCCTCCACCAGAACCATCCCCCATCATAACAAT 
GTTGTTCGCACCAACTTCTTCAACTAATTGATTATAAACGTCACGAATCGC 
TTTGAACGTTTCTAAGTAGTGATAATCTGGTGACTTTGGATAGATAGGTAA 
AACGACCTCATGCAATGTATTTAAAGTTAGTTTATCTAAAAGTCTCCAATG 

20 AAATGGAGAAGGTTGCAAAGTATTATATCCACCATGCAGATAGAGTATCTT 
TTGATTTTTCTCATGTCTAAAATTAAATCTAAACACTTGCATATCATTAAG 
AGTAAGTTTATCTAAATTTGACTTTACATTAAGTGTCGCAGGTTGCTGATG 
TTTTTTACTATTTTCGATACTTCTCTTTTCTAAAAAATGTTTAACTTCATC 
ATCATTACTAAAAAATATCGATCTATTGTGAAGTATATATTTATTAACCAC 

25 GCGATTCATCATTCTATTTTTCATTAATCAATCAACCTACCTCTATTTAAA 
ATTTTAAATTTAAGAAATAATTTGTTTAAATATCTATATACCTTTGCTAAA 
AAAAGATATAATGATAAGTATGTCATTTTAAAGAAATATTTTTTAATCAAT 
ACTTAGGATACTAGATTAAACCCAATTATAACAGACATAAAATTAAAAAGA 
TAAACTTAGTAAATTTGCATCTAAAGGAGGCATTTAAATGTCTGAAACATT 

30 ACAAAACCAAAGACAATTCAACTCTCAATTTGAACATCAAGAGATCCATCG 
TGGAAAAAGATATGGTAAAAAGAAACGCTCATGGGTAAGTCTCATTATTCA 
AGTTTTCGTTTTAGTATTAACTGCTATTACTGGCTATAGTATGCTTAAACA 
ACCTATATTTAAAATTTCATTTGTAAATGAAACTATAAATTTCCATCAATT 
AAGAAATTTTCAAGATACAGTTACGCAAATCGGTAATTTGAATTTAGGTAA 

35 TATTGATCAATTACAACAGTCTGTCGATAATCTCATCATTATATTTAATAT 
TTTCTTTGTCCTATGTCTGATTAGTTTGTTTATAACCGTAATTACAATTAT 
TTTTAATCGCACAGCACTCAAAGTAGTCAATATATTACTTTTTAGCTATTA 
TGCTAGTCATTACATTGTATTTCAGTTATATCATACATACAATTGCTCAGA 
AAATTTCCGAATCTCTAAAACAATATTATTTAACAGTATCACCAGAACAAG 

40 TATTAACTGAAGCAGATGCGATTCACAATGCACTGATACTCATTGGATGTA 
GTATTGCGTTATTGATTGTAAGTTTGTTTTTCCGTAATCGTTTACCACGTA 
TTAAATAAAGATGATTTTAATGTTCTTTTAA 



45 Sequence 3396 

step . 1002e05 , cons . ok 

CCATAGCAACATTTTCAGTTATGAATTGATTACTTTGTATGGGATGTGTAA 
CTTCTATCACTTTCATAATATTATTTCTCCTTTAATTAAAAACTTTTTTAG 
GTTTTATTTCATAAGGTTTATCTGGATGAACTATATATATTGCTAAAACTT 

50 TTTGAGTGCTACTATCGACAAATATAAAAGGTTCTTTAACATTTTGACTTA 
ACACTTTTTTATGAAATTTTTGACCGTTACAGATTTTCTTTTTAAAATTTG 
AATCTTTCACTTGGAATGATTTCAGACCTTTTAAGCCATATTCTATAGGAA 
ACAATTCATTATGTAATGAATCATGCTCATGTAATTCTTTAATTTGATCAA 
TCGTTAAACTACTTTCTAATTGAAAACCGCCAGAAGCAATTCTAGTTAGAC 

55 GTGACATATGAGCTGGAAAACCAAGTTTAAGTCCAATATCTGTAGCTAAAG 
TTCTAATATAAGTTCCTTTACCACATGTTACTTCAACTTCAAAATGACATG 
TCTGCTCCTGAAAAGTAACTTCAGATATTCTATGTATATCTTTAATAAAAA 
CTTGTCGTTTAGGGCGTTCGACAGTTTCATTATTTCTCGCATATTCATATA 
ATTTTCTTCCATTTACTTTAACAGAAGAATACATAGGCGGAATTTGTTGAA 
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TATGCCCCTCAAATTGCTGCAACACTTGGTCAATCGTATCTTCATTAATAT 
CATTCTTATCAACAGCCCTAGTTTCTAAAATATCTCCAGTTTGGTCTTCAG 
TAGTCGTACTCTTTCCTAGCGTTATCATAGCGTGATAAGTTTTTCCCATTT 
CCATGATATAATCACTGACTTTTGTCGCATCGCCTAAACAAATTGGTAACA 
5 CACCATTAACTTCAGGATCTAATGTTCCTGTATGACCAATTTTTTTCATTT 
TTAAAATTTTACGTAATTTAAAAACGACGTCGTGACTTGTTAAACCTCGTT 
TCTTAAATACCGGTAGTATGCCATTATACATGTTGTCACACCTTTAACTTT 
TAATAACGTGTATTATTATAACACGATATTTTAATTCTTTAATTAGCGTTT 
TACATAAAAATAGTAGGCAATATCAAAAATATGCTTTATGAATGATTGTGC 

1 0 CTACTATAAAATATATTATTTATCATTTTTGTGTAACTCTTGAATCATGCG 
TTCTATCTTATTACCGTATTCGATAGATTCATCATATTCGAATGTTAACTC 
AGGTATAATTCTTAGGCGCATTCGAGAACCAAGTTCAGATTTTATAAACCC 
AGTTGCTTTATGCAAAGCTTTAAACGTATTATCAACTTCTTTATCATTCCC 
TAACACTGTTAAATATACCTTTGCTTGTGAAAGGTCATTGGTTAGTTCAAC 

1 5 ATCAGTAATTGTTAAAAAACCAACTCTAGGGTCTTTAACTTTATTATTAAC 
AATGTCCATGATTTCCTGTTTCATTTGTTCTCCTACACGTTCTGCTCTTAT 
ATTATTCATCATCTCACCTCTCTTTATTTTTACTTTACTCACAAATAAATG 
TGAAATATCAGATTTAATGATTAACTATTGATGTTCATTTTATTAAATTAT 
CAACAACGAATCAATAAAATAATACTAAATGAGAGACCATAACTAAAGTAT 

20 TGATAGCATAACATTAAGCTTTTCCAAATCAAAAATTTATATCTTCAATTT 
GATTGTTGGTTAAGCAAAGTATGTCATACTCTCCTTTAAGATATTAGGTTT 
ATGAAATATCTTAAAGTACTTAACTTAGCATATAGCCTCTCAATAAGTTTC 
AAAAATTATCTTTGAATTTCTACCATTTCAAACGCTTCAATAATGTCTCCT 
TCTTTGAGATCATTATATTTTTCAATTGTAATACCACATTCATAGCCTTGA 

25 GCTACTTCTTTAGCATCATCTTTGAAACGTTTTAATGTGTCAAGTTCACCT 
TCAAATAACACGATACCATCTCTAATTACGCGTACACCAGCGTTACGAGTG 
ATTTTACCTTCAGTCACATAACTACCAGCAATTGTACCAACTTTAGAAACT 
TTAAATGTTTGACGCACTTCAGCTTGTCCAATGACTTGCTCTTCAAATTCT 
GGGTCAAGTAAACCTTTCATAGCTGATTCTATCTCTTCAATAACATTATAG 

30 ATAACTCTGTGTAATCGCATATCTACATTTTCAGCTTCAGCCGCACGTTTC 
GCACCTGCATCTGGGCGTACATTAAAACCAATAATAATACCATTTGATGCA 
TTTGCTAATGTAACATCTGATTCATTGATAGCACCAACAGCTGTATGAATA 
ATTCGTACATTCACACCTTCAACATCTATTTTCATTAGAGATGCGGCCAAT 
GCTTCAACTGAACCTTGTACATCACCTTTAATGATGACATTTAAATCTTTC 

35 ATTTCACCTTGTTTCATTTGCTCAAATAAATTGTCTAATGAAACATTTTTA 
CTTTCTTGACGTTGCTGTATGACACTTGCCTCATGACGTGCTTCACCAATT 
CGACGTGCTTGTTTTTCATCACCAAATACAACAAAACGATCACCTGCAAGT 
GGAACATCGTTAATACCAGTAATTTCTACAGGTGTTGAAGGACCGGCAGAT 
TTAATTCTTTTTCCTAAATCATTAACCATTGCACGTATACGTCCATAAGTA 

40 TTACCTACAACAATTGCATCTCCAACGTTTAAAGTACCATTTTGAACAAGT 
AAAGATGCAGCTGGACCTCGTGATTTATCTAATTCAGCCTCAATCACAGTA 
CCTACAGCTTGTTTATTAGGATTAGCTTTAAGTTCTTGTACCTCCGCTACT 
AAACCGATCATTTCTAATAAATCATCAATACCGTCTCCACTCAATGCAGAT 
AGTGGTACAAAGATTGTGTCACCGCCCCAGTCTTCTGGAATTAATCCATAC 

45 TCAGTGAGTTCTTGCATAACACGATCAGGGTTAGCAGTTGGTTTATCAATT 
TTGTTTACTGCAACAATCGTAGGTACTTCTGCTTCTTTAGCGTGATTTATA 
• GCTTCAATTGTTTGAGGCATCACACCATCATCAGCGGCCACGACTAAAATT 
GTAATATCAGTAACTTGAGCACCACGTGCACGCATAGTCGTAAATGCAGCA 
TGTCCAGGAGTATCTAAGAACGTAATTTTTTTACCTGAATTTTCAATTTGA 

50 TAAGCACCAATATGTTGAGTGATTCCGCCAGCTTCTCCTTCTGTAACTTTA 
GTGTTACGAATAGAATCTAATAAAGTCGTTTTACCATGGTCTACGTGGCCC 
ATGATTGTAACAACTGCTGGACGTTCAATTGCATCAGAATCATCAGTCTCA 
TCATCAAAATAAATTGATAAATCTTCTTCATCAACGACTACTTCTTTCTCT 
ATTTCAACGCCATAGTCATCTGCAATTAATTCTAATGTTTCTTCATCCAAT 

55 GATTGATTGATATTAGCCATAATACCTAGTAAGAACAATTTTTTAATAATA 
CCAGCTGATTCTACATTTAGCTTTTCAGCTAACTCACCGACAGTTATGCCT 
TCTTGATAAGTGATTTTAGAGGGCATTTCTTTTGTTTCTGCTACCTCATTT 
TGAGGTTTATTATTTTTATTATTCTTTTTATTTTTATTGTTTTTTTGATTC 
TTATTAGTTTTATTGTTTTTATTTTGCTGTTTTCCTTTGTTGTTT^ 
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TTTTTCTTCGTTGGTTTACTATTATTCTTACTTTGTTGTTTTTCTTTATCG 

TTAGAATTTTGTTTATTATTAGATTTTTGGTGATTATTTTGAGTATTTTC 

TTATTAGTGTCTTTCGCT 

5 

Sequence 3 397 

step . 1002e06 . cons . ok 

AAAATTCAATATACGTTTCTTTTATAATTTTAGTAATTTCTCTTTAAAAAT 
ATCACGTAAAAATTCAGGCATAATATAATCTGTAGATAACCCCTTTAGCAA 

1 0 TGGG AAATATCTTCCAAATGT ATACATATC AACAGTACCTAACGTATATTG 
TTTATCATTTTGAATCTCTTCATCGCCTATAAAATAACGGTCTTCTGACTC 
AATAAAGCCTAGATTATATAAAATATATCCTCCAAATATATCCCCTCTAAA 
ACCAAGACCTTGTGCATGTTCGTGCATATATTCTTGCTTTTGGCTTTTTTG 
AATCACTTGCTTTAATTGTTTACCGGTTAATCGAACTCTTACAATATTGAT 

1 5 TGGATGGGGTAACATGCGATGTATATCATATTCCGTCACTTTATCAGCTTC 
AATGCCATTAACTATAAGTCCAGCATTTACGATTGCACAATCAGCCCTTGA 
AAACTCATATACACTTTCAGCCAGTAAATACGATGTTCTTGTAACAACATC 
TGTTCTTTTGACTAAGTTCACATGATGATTAACTACTGGTTTACTTAGAAG 
TGCTCTTCCTTCTTCTTCAAAATGTGTCTCGACTAAGGGAAGTGTTTCAAT 

20 AGGATGAATTTTGGCGATTTTATCAACGATTTTTCCATTTTCAATCGTAAT 
ATTAACTTCACCTAAATAATAGCCATATTTTCCGGCAGCTGCCATCAAAAC 
ACCATTGTTTATTTCTCCATGTTCAAAATGATGATGCGTATGACTACCAAA 
GATAACATCTATTTCCGGAATCTCTTGGCATAACTTTTCATCAAAAAAGAT 
ACCGACATGGCTCATAACCATTAAAAGATCATATTCACCTTGATGTGCATT 

25 GATTTCATCTTTGATTGCCGCTAATGGGTCAGTAACAATCCAATCCAGTGC 
TCGATAAAAAGGTGTGAACGGTGCCGTTGCTGCAACAAATAAAATACGTGT 
TCCTTTTATTTCTTTGATATACGAAGAGGTAATATGATGTGGAAGATGTCC 
CTCTTCATCTATGACATTCGTGCAAATCACTTTAAAATCCGCGTCGTTATA 
TAGATTTTGTAAAGCATCATGAGAAATTGTCATTCCTTCATTATTTCCAAT 

30 GGTTGCAATATCACAATGTGCTTCATTTAAAAGTTCTATATTTTTATGTCC 
TACCGTAGCTTCTGTCACAGGTGCTGATAAATCAACATGGTCACCTATATC 
TATATAGAGTGAGGGATGTTCAAGTTGCGGTCTATGTTTTGCCATATAAGC 
TTGAATACGAGCATATTCATTTAAATGACTATGAATATCATTCGTGTGATA 
TATTGTTAATTCCATAACATGATCCCCCTCTTTAAAATTTATATACTTAAT 

35 TGTACTTTAGCATTAAAGAACAAAGTCTCAAAGCATAACTACTTCGCATTT 
AAATAAAAGATTTAATGATTAAATATACACCCATGATAAGCATTACTGTTC 
TCAATAACATTACAACTGTGTCGGATTTCATAGATCGATTGACCCTCACAC 
CTATTTGTGCACCTATAACACTTGAAATAATGAGAATGATAGAATAGCCCC 
AAGCCACATGTCCTTCAAAGATGTGCCCTATTGAACTCATCACACTTGAAA 

40 AGAAAATCATCATCATACTTGTGCCTACTGCAACATGTGGTGGAAATCTAA 
AGACGATGAGCATAAGAGGGGTCATCAATGCACCTCCACCTATGCCAAACA 
GTCCTGTTAATAGCCCAATAAATAACGTTGTTATAAAAGCAAACAGGGGTG 
GTACACTATAACGGTATGTTTTCCCTTCAGCATCAACATATGTACGCGCAT 
ATTTAGGTTTGTCAAAAATTTTAAAAGGTTTGATTTTGTGACGAATCATCA 

45 GTAATATCGACACAAATATCATAAACACGCCAAAGTATAAGTTAAACGAAT 
CAAGTGTAAGGTATCTACTTAAAAAAGAACCAACCAAAGATCCAGGTAGTA 
GTCCAAATAGAAATATCGAACCATTTTTTATATCTACTTGTTTGGTTTTTA 
AATATCCTAATGAGGAAGATAATCCAGTAACAATCAGAATAACTGAAGAAG 
TACCTATTGCAATTTGCGTCGTAATACCATGTAGAAGATGATGGTTAACGC 

50 CTAAATAAACAAGTGTGGGGACGATGATGATTCCACCACCAATACCAACGA 
TAGATCCTAAAATCGCTGATAATCCCCCTATTAATATTAATAGAATAATTG 
TTAATAACATGACATTCGCCTCTTAAAATAGTTTTAATTGTTGCGGTGCTA 
AACCTTCATAATCAATGTCTAAAATATTTTGATACTTTTTAGCATTATTAG 
CTGCATGACCGCCAGAGTTATTATTAAAAATTACATATACTTTCTTAGCCT 

55 TTTGATTAAGTATTTCGACTTTACGAGCCAAGTCAGCTAACTCATCATCGC 
TATAATCATATAAATATCTTACATCTCGCCATTCTTGATCAGTCATATCTT 
TTTTAGTCCAACCATAATGATTACGTCCATGATAACGTACAAAAGCAATTT 
CACTAGTAATCCTATTTACTAAAGGAACGCTCCCCTCTTTAACTTGAGGTT 
CATCTACCACTGCATGAATGATTTGATGTTGTGTTAAGAAGGATAAAGTTT 
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GTTCTTTATACTGATTGTCAAACCATGATTGATGTCTAAATTCAATGCTCA 
TCGGAATATCAGTTAATTGTTTTCTCACATATAAGATGTAATTTATATTTT 
GGACGTTACAATCAAACCAAGGTGGAAATTGTACGAGCACCATCGCCAATT 
TATTATGCTCTTTTAACGGATTGATCATTTCTTTAAACTGTTTAAATAATA 
5 TATCTCTCGACTCTGCATAATCTCTATAGTCAGCATGCAATGTTAATGCTT 
GATGAATTTTAACAACGAATTGAAATCTATTCGGCGTTTCATTTATCCATT 
TAATTATATTTCGCTCTGGCTGTATCGCATAATACGAAGCATCCAACTCTA 
CTATTGGGAAGTGACTAGCATACGTTTTTAACTTATCTGTTTTACGCTCTA 
AATCCTCATATAATGTATCATGGTCACCCCAGCCTTCTAAAAACTCTGGAT 

1 0 AAAAATC AC AATTACCTTTCAC ACGCTGATACAAACTAAGTTCTGTGTCAT 
TATAACTAAATTCAGAATCCCCTAAATGTAAAAATTTATCTGCATCTTTAT 
GTAGTTCATAAATATGATATAAGATACCTGTTTCTTTGTGATTATCACTAA 
GAATAATCCACTTAGACATGTTGTTCACCCTCTAAATATTCTTTTAACAGT 
CGAATAGCATTTCCACGATGACTGATTTTACCCTTTTCATCATTTGTAATT 

1 5 TCCGCCATCGTTTTATTTAATTCTGGGACAAAAAATATTGGATCATAACCA 
AAACCATTCTTTCCATGTCGTTCAGTTGTTATCACTCCAGAAACTGTTC 



Sequence 3398 

20 step . 1002e07 . cons . ok 

TTTGGAAGTTCTGTTGATAAAATCAAATTATTCGCTGAATTGACTTCTAAA 
TATAAAGATTACCATCCTGAGTATCTGGGTAAGTTTAACAGATATGAAATA 
GATAAATATATTCAAGAAATTAGCAAAAGTTATAAAAAAAGAGGTAATGAA 
AATGGCTGAAAATCAAGCTAAAAAAGAACAAGTTCATACAGTATTCCAAAA 

25 TATTTCTCAAAAATACGATCGACTCAACAACATCATAAGTTTCGAACAGCA 
TAAAGTATGGAGAAAACATGTTATGAGTACTATGAATGTACAAAAAGGCAG 
TAAAGCTTTAGATGTGTGTTGTGGTACTGCAGATTGGACTATTGCATTAAG 
CGAAGCAGTAGGCTCAAAAGGCCAAGTAACTGGTCTTGATTTTAGTGAAAA 
TATGCTTGAGGTAGGTAAACAGAAAACTGCTTCATTGGAAAATATCCAGCT 

30 TGTTCATGGTGATGCTATGAATTTACCTTTTGATGATAACTCATTTGATTA 
TGTTACTATTGGTTTTGGCTTACGCAATGTTCCGGATTATTTGTCAGCACT 
CAAAGAAATGCATCGAGTTTTAAAACCAGGAGGCATGGTGGTATGTCTTGA 
AACAAGCCAACCGACATTGCCATTATTCAAACAAATCTATAGTTTATACTT 
TAAATTTGTGATGCCAATTTTTGGTAAAATGTTTGCTAAGTCCAAAGAAGA 

35 ATATGAATGGTTGCAACAATCAACATTTAATTTTCCTGACAAACAAACATT 
GAAAGGCTTATTCTTCGAAGCAGGATTTAACGATATCATTGTACGTAGCTT 
TACAGGTGGCGTTGCTGCAATGCACCTTGGCTATAAAGAAAACAGTTCTAG 
CAAAGGTGATTAACGTGGCAAAGTTAAACATTAACAACGAAATAAAGAAAG 
TAGAAAAGCGACTTGAAG/^GCAATTATAAGTTCTGATCAAACATTACAAG 

40 AAGCCTCATTCCATTTACTATCTTCAGGGGGAAAAAGAGTTAGACCCGCTT 
TTGTTATTTTAAGTGGTCAATTTGGCTCTAACAACAAACCTTCAGAAGACA 
CGTATCGTGTAGCAGTAGCTTTAGAACTAATTCACATGGCTACCTTAGTCC 
ACGATGATGTGATAGATAAAAGTGATAAACGTAGAGGCCGACTCACTATTT 
CAAAAAAATGGGACCAAAGTACAGCTATTTTAACAGGAAATTTCTTACTTG 

45 CTATGGGGCTCAAGCATTTATCTGAAATCAGTGATACTCGTGTCCATTCGA 
CCATTTCTAAATCAATTGTTGATGTGTGTAGAGGAGAACTATTCCAATTTC 
AAGATCAATTTAATAGCAATCAAACAATTACTAATTACTTACGTCGTATCA 
ACCGTAAAACAGCACTTCTTATTCAACTGTCTACACAAGTTGGTGCGATTA 
CTTCCAATGCGTCAAATGACGTTATTCGTAAATTAAAAATGATCGGACATT 

50 ATATAGGTATGAGTTTCCAAATAATAGATGATGTGCTAGATTTTACTAGTT 
CTGAAAAGAAACTTGGTAAGCCGGTTGGTAGTGACCTTATGAATGGTCATA 
TTACATTACCTGTACTATTAGAAATGCGAAAAAATAAGACTTTTAAAGATA 
AAATTTCACAACTTAATCCTGACAGTCCTCAACATGCCTTTGAAACTTGTA 
TAACAATAATTAGACAGTCCGAAAGCATAGAACAATCAAAACAAATAAGTG 

55 AAAAGTATTTAAATAAAGCAATCAATTTAATCGATGAATTAGAGGATGGTC 
CTAATAAAGAACTATTTAGAAAGCTTATTAAAAAAATGGGAAGTCGAAATA 
AGTAAGTTTTGTGACAAAAGTTGAAAGCGCTTCATTATCCTGTTACCATAC 
TAAGTAGCAGGATATTTTTATAGCTTGCAATGTTTTAAGTTTTTTACATAC 
TTATAAAAATGAATGAAACACATGCACAGGGGGACACACGAAGTGGAACGC 
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ACATTTTTGATGATTAAACCGGACGCTGTCCAAAGAAATTTAATTGGTGAG 
ATTATATCTCGCATTGAAAAGAAAGGTCTTAAGCTTGTTGGCGGTAAGTTT 
ATGCAAGTGCCTATGGAACTAGCTGAAAAACATTATAGTGAACATGAAGGA 
AAACCGTTGTACGACAAGTTAATTTCATTCATTACTTCTGCACCCGTTTTT 
5 GCTATGGTAGTAGAAGGTGAAAATGCTGTAGCTGTATCTCGTAAAATTATT 
GGAAGCACAAATCCAAGCGAAGCAGCTCCTGGTACAATCCGAGGCGACTAT 
GGTTTAAATCTAGGTCGTAATATTATACATGGTTCAGATTCAACAGAATCA 
GCACAACGTGAAGTTAAATTATGGTTTACAAGTAGCGAAATTGCTGATTAT 
AAAGAACCAAGAGAAGATTGGCTATACGAATAATAAAAATATCAAATAATG 

1 0 AAATTCATTTTATCCATTAATGTCTGTGACGCATATTTGTGTCTCAGGCAT 
TTTTTATGATTAACCAAATTTTATTTGCTTAAAATAATACTTATTGAATTA 
GTCTATTTATTTTTTTATCTAGGGATTTAACGATGCAATACCGTATATTGT 
TCGATTATATAGCAATTTTATCTTTTGACAATTGAGCATTCATTATTAAGT 
AATGACCTACTTTTAATATATCAATCATGTATCTAATGATTTCGGTCTTTT 

1 5 GAGTTTTTTAATAG ACTGATCTTTAATTGGTATAATGTCACTATGACATAA 
CAATCATCTTGGTGCTATTCTTTTTTAAATATTTATAACTGCTAAAATAAA 
AATCTCCTTGTTTGTCTAACTGACTTATGTTGTGTGATTGAATATGATAAT 
ATATTAAACATATTACAATTTAATGTATTGAAGAATAGAAAGGGGCGTGCG 
ATCATGAGATACCTTACATCCGGTGAATCACATGGACCACAACTTACAGTC 

20 ATCATAGAAGCTTTAGAGCACAGTGGCGA 



Sequence 3399 

step . 1002e08 . cons . ok 

25 TTTAAATTGTCAGATGATCAAGAAAATGAAATTGAGCAATTATTAGATCAA 
ACCAATCCTGATTTACCACGACCAGTAGGAGAGGATATTGTACATTATTCA 
GATTATTTTGAAGGTGCACAAAAGTATCTAAGTTATCTTAAATCAACTGTT 
GATGTTAATTTTGAGGGTCTTAAAATTGTATTAGATGGTGCAAACGGGTCA 
ACTTCTTCTTTAGCCCCATTCTTGTTTGGCGATTTAGAAGCGGATACTGAG 

30 ACAATTGGATGTAATCCAGATGGTTATAACATTAATGAACAATGTGGCTCT 
ACTCATCCAGAAAAATTAGCTGAAGCTGTGTTAGAAACTGAAAGTGACTTT 
GGTTTAGCTTTTGATGGAGATGGCGATCGAATTATTGCGGTAGATGAAAAT 
GGACAAATTGTAGATGGAGATCAAATTATGTTCATTATTGGTCAAGAGATG 
TATAAAAACCAAGAACTCAATGGAAATATGATAGTTTCGACAGTAATGAGT 

35 AACCTTGGTTTCTACAAAGCTCTAGAAAAAGAAGGTATTCAGTCAAACAAA 
ACTAAAGTTGGAGATCGCTATGTTGTCGAGGAAATGAGAAGAGGAAATTAT 
AATCTTGGTGGTGAACAATCCGGTCATATCGTATTAATGGATTACAATACT 
ACTGGTGATGGATTATTAACGGGTGTTCAGTTGGCTTCCGTTATTAAAATG 
AGTGGTAAAACTCTAAGCGAGTTAGCTTCTCAAATGAAAAAGTACCCACAA 
' 40 TCTTTAATTAATGTGAGAGTGACTGACAAATATCGTGTTGAAGAGAATATT 
CATGTTCAAGAGATAATGACGAAAGTTGAAACAGAGATGAATGGTGAAGGA 
AGAATTCTTGTTCGTCCTTCTGGAACTGAACCTTTAGTACGTGTAATGGTT 
GAGGCTGCAACTGACGCGGATGCTGAAAGATATGCTCAAAGTATCGCTGAC 
GTTGTTGAAGACAAAATGGGCTTAGATAAATAATACTTTCATGTTTAAATT 

45 CATATATAAAACAGATTTATGATGTTACAGACATAAATCTGTTTTTTTTAA 
CAGAATTTTATCTTTTAAAAA6ATTTGAAAAATATATTTTTTTATTTTGAG 
AATAATTCATCGATGTATCTATTCAGAACTATTACTATTTGTGAAAAATAC 
TGTTAAATCGTTTGCAAATTAATTCGTTGACAGTGTACAATGAATATTAAT 
GGAAAGGAGGAGAGTCTAGGTTATTAATAATTAAAGCGCCTGTGCAAAATA 

50 CCTAAATAGTTATCAAGTATTAAAAGTTAAAATAGTAGAACTTTTATAGAT 
AACGATTTGGATTTTGTAGACGAGGAGGATAGTGATCGAATCAGATCGGCG 
GATGCTATCCCGGATGTGGCACATTCGTTAGCTTATTAAGTAAATCATTAA 
GGTGACTTAGTGGACAAAGTTAATAAGATCGCCAAACTAAATAAATTATTT 
TGGGAAACTACGGATAGCGTGTATAAAATTCATATAATAGCACGCTTATTT 

55 CATGATGTATTCGTTGATAATGATTATGATAGGGGATTATCAAATTTAAGC 
AGTGATAATTAATCATAATAGCCGTGTTATCCCATTTGTTGGAGGATATTA 
TATGTGTGGAATTGTTGGTTATATTGGCTACGATAATGCCAAAGAATTACT 
ATTAAAAGGGTTAGAAAAATTAGAATATCGTGGTTATGACTCAGCAGGTAT 
TGCTGTAGTTAATGATGAT^TACAAAACTTTTTAAAGAAAAAGGAAGAAT 
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TGCTGAATTACGTAAAGTTGCAGATAATAGTGATGAGGATGGTACGTTAGG 
AATTGGTCATACACGTTGGGCGACACATGGTGTTCCAAATTATGAAAATTC 
ACATCCACACCAGTCAACATCTGGACGTTTTACATTAGTTCATAATGGTGT 
AATTGAAAATTATGAAGAATTAAAAGCTGAATATTTATCTGATGTCACTTT 
5 TTCATCAGAAACTGATACGGAAGTTATTGTACAATTAGTAGATTATTTTTC 
TAGACAAGGATTAGCTACAGAAGATGCATTTACAAAAGTAGTTAAATTATT 
ACATGGTTCATATGCTTTAGGATTATTAGATGATAATGATAAAGATACTAT 
TTATGTGGCTAAAAACAAGTCTCCGCTTTTAGTAGGTGTAGGTGAAGGTTT 
CAATGTTATTGCTTCTGATGCTCTAGCAATGTTACAAACTACAAACCAATA 

10 CAAAGAGATACATGACCATGAAATAGTTATTGTTAAGCGAGACACAGTAGA 
AATTAAAGATCTTGAGGGGCACATTCAACAACGTGATACGTATACGGCAGA 
AATAGATGCTGCTGATGCAGAAAAAGGCGTATATGATCATTACATGTTAAA 
AGAAATTCATGAACAGCCTGCAGTGATGCGTCGCATTATTCAAGAATATCA 
AGATGi\AAAAGGTAATTTAAAAATCGATTCAGAGATTATTAATGATGTAGC 

1 5 AGATGCTG ATCGTATTTACATCGTTGC AGCTGGTACTAGTTATCATGCTGG 
ATTGGTTGGTAAAGAATTTATTGAAAAATGGGCAGGTGTACCTACTGAGGT 
TCATGTAGCTTCTGAATTTGTATATAATATGCCACTTCTTTCTGAAAAACC 
ACTATTTATTTATATTTCACAATCTGGTGAAACAGCTGATAGTCGTGCTGT 
ATTAGTTGAAACAAATAAGTTAGGTCAGAAATCATTAACAATTACTAATGT 

20 TGCTGGTTCAACATTATCACGTGAAGCGGATCATACATTACTTTTACATGC 
TGGACCTGAGATTGCAGTCGCATCTACAAAAGCATATACAGCGCAAATTGC 
TGTTTTATCTATCTTATCTCAAATTGTTGCTAAAAATCATGGTCGTGAAAC 
CGATGTTGATTTATTAAGAGAACTAGCTAAGGTTACTACAGCTATTGAAAC 
AATTGTTGACGATGCACCTAAGATGGAGCAAATTGCAACGGATTTCTTAAA 

25 AACTACTCGTAATGCATTCTTCATTGGACGAACAATTGATTATAATGTTAG 
TTTAGAAGGTGCATTAAAATTAAAAGAAATTTCTTATATTCAAGCTGAAGG 
ATTTGCAGGTGGGGAATTAAAGCACGGAACAATCGCTTTGATTGAAGATGG 
CACACCTGTTATAGGTTTAGCTACACAAGAAAACGTTAATCTATCAATTCG 
TGGAAATATGAAAGAAGTACTTTAGAGCACAGTGGCGATGATATCA 

30 

Sequence 3400 

step. 1002 elO .cons .ok 

TTCCTCATAACGACCCTTTTCAACTTCTCTTGGATCATATTTTGGTTTCAT 

35 TTCCATATGTCTACCTCCTAAAAAATAAAAAGAACATCCATCCTATAAAAA 
ATAGGACGGATGTTCTGTTCCGTGGTACCACCTATATTCAAGAAGAAATAA 
TTCATTCTTTTCTTCAAGCACTTAAGTCTCTGATTAACGCTCATACACGCT 
TCTACCTACTTGTCGTCTATGTTTCAATAGAAGTTATAAAATAGGGCTACC 
TTCAGACTTAATCATTTACATATTCACAGCCACCATATGCTCTCTTTAAAA 

40 TGATATAAACTTACTCTTCCCAAATAATATATTCTTTTACTCATTTATATT 
ATAATGGATATTAAAATAAGTCAATGTATTGGAGATGAAATTATGAATGAA 
TGTGCTTTTAACACAACTGATCCTATCTATATAGAGTATCATGATTATTAT 
TGGGGACAACCAATCTATGATAGCAAAGAATTATTTAAGCTAATGGCTCTT 
GAATCACAACATGCAGGATTATCATGGTTGACAATATTAAAGAAAAAAGAA 

45 TCGTATGAACAGGCATTTTACAACTTTGAACCTCAATTTATCGCACATATG 
ACTGAACAGGATATTGATTATTTAATGAAATTCCCAAACATTATTCATAAT 
CGTAAAAAGTTAGAAGCAATTGTGAGCCAAGCGAAGGGATATTTAAAAATT 
GAAAAAGACTATGGTAGTTTTAGTAAATTTTTATGGTCTTATGTAAATCAT 
CAACCCATAAATATGGGTTATAAAAAACCTAGAGATCGTAAAAAAGTTGAT 

50 CAAAGAGCTACTCAATTATCAAAAGATTTAAAAGCATATGGTTTTAAATTT 
TTAGGTCCAGTAACCGTGTTCTCATTCTTAGAAGCTGCTGGATTGTATGAC 
TCACATCTTGAAGGATGTCCTTTCAAACCAAATCATGAGTGAAGCATAGAC 
ATAAGTTGCGCTGCGGATATTCAAACAATATGAGTTTTTCACGCATACTAA 
AAGTTTACAACGTTTAATTTATCTAGATTTTACTTTTAAGTAATTACAATT 

55 ACTCGCAAATATTTTCGAAGTTTTATTTAAAAATGTATAACTTAAGTCATT 
AAAATCATTTATTATTTAATTTAACATTAATAAAATAACCAATTAGTGGCG 
CAATGACAAATAATATAAAAAATATTCTAAAAATGTGATAGCTTGAAATCA 
TCGCTACATCAGCTCCAGTAGCCATAGCCACTAAAACTATTTGACTCATAC 
CTCCTGGTGCTGCTCCTAAAAACAATTCATTGATGGATTCATTAGTAATCA 
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AATGTATTCCTATTATCATTAT7VAACGTTGTGACTATGAGCATTATATTTT 
GAAAGGCTATTGCTATTGCAATTCTTCCCTTTAAATCACTCATTAAGTTGG 
CAATCTGTAATCCAATACGTATCATATAAATAAGTT6CGCTGTGGCTAACA 
ACCAATGATCTAGTGAAAATGTTAAATGTGTTGTCATATTCCATATAATTA 
5 AAACTATTATAGGTGCTAATAATTGTTTAGTGGGGAAGTTAATTTTTGACA 
TTCCTATATAGATGATTCCCACCATTGAGAATAAGATGATTATTTGCCATA 
TATTTAAAGTCTGAGAAAGTGTGGGTACTTCCATTGTAGTATGATTCATTT 
CATGATGGTTATCCTGAAAAAAATACGAAATAAGTGGTACTAAAATAACAA 
CAAATATTACACGTGATGTCTGTGTTAAACTCACAACTAATATATTTGCTT 

10 TCTTATTTTCTTCTGCCATCACTAACATTTGGCTTAGCGCACCTGGTATAA 
CACTTAAAATTGCAGTTTCTAAATTTACTTGTGCAATTTTCTTAAAAAAGA 
ATGCAATTATCAAAGCTAATAAAATTAGTAGGATAGTGACAAAAACGATAG 
TTAGCCAATTTTTACTTATGTCTTTAATCACTTGTTGTGTGAAGGTAGAGC 
CAATTTGAACTCCAAGTAGTATTAAACCAATTTGACTCAACCAAAATGGCC 

1 5 ATCGTATTTTTAATTTTAATACTTTTACACATAATAACGCCGCTAATATTG 
GTCCAAACATAAATGGCAAAATCACATGTAACATTTTTAATAACAGACTTA 
ACATAATAGCAATTAATAATATGATTAAATTATTAATAAAATTCCGCTTCA 
AAATACCACTCTACTTTCGTTAAAAGTAAGTCACAAAATTTCTAACTTTAT 
TTAAATTCAAAGAGTAGAACAAAATGCATCTTAAATTCCTAGTCCACCATC 

20 GAACCCCAACTTGCCCTGTCTGTTGAATTTCTCAGTGAAATTCTTTTTGTT 
AGGGCCCCAAACCCCAACTCGCCTTGTCTGTTGAATTTCTAAGTGGAATTT 
ACTGTATTGATGACATAACATAGCTTTAAAAGTTGATATAAGTGCATTTTT 
AATTCAGTTATAAACTACTTATAAGTT7VAGAAAGTCTAGGACATTGACTTT 
TGCCCTAGACTCAATTAATTTAAATAATTATATTTTAAAATTATCACACAA 

25 TACGACTTAATGCATTATCAAATGCTTGGATAGTTTTCTCAATATCATCTT 
TAGTATGTGCAGTTGATAAAAATGTTCCTTCAAATTGTGAAGGTGGTAGGA 
AAACACCTTCCTTAGCCATTTCTCTATACATATTACTAAATAATTTTAAAT 
CACTTTTATTTGCTTCCTCAAAATTTGTTACAGGCCCCTCATTTAAGAAGT 
AACCAATCATTGAACCAGCGCGATTTACTGTGATTGGAACATTATGCTTAG 

30 CAAATACCTCTTTTAATCCTTTTTCAAGTATATCTCCTAGAGAATTAAAAT 
ACTCATAAGATTCAGGAGTAAGTTGACTCAATGTTTCATAACCACTAGTCA 
TTGCTAAAGGATTACCTGAAAGTGTGCCAGCTTGATAAATAGTCCCAACAG 
GAGCAATGTAATCCATAATTTCTTTTTTACCACCAAAAGCTCCAACGGGTA 
AACCTCCACCTATCACTTTTCCTAAGCAAGTTAAATCAGGTGTTACACCAA 

35 AGTATCCTTGCGCACAATTATAACCTACACGGAAACCAGTCATCACTTCAT 
CAAATATAAGTAATGCTCCATATTCATTAGTAATATCTCTTAAACCTTGTA 
GAAATCCATTCACTGGAGGCACTACACCCATATTTCCAGCAACCGGTTCAA 
CAATAACACCAGCAATATCATCGCCATATTTTTCGAACGCTAATTTAAGTG 
AATCTTU^TCATTATATGGCACCGTGATAGTGTTTTTAGCAATACCTTCAG 

40 GGACGCCTGGTGAATCAGGTAAACCTAGTGTTGCAACACCTGATCCTGCTT 
TAATCAATAAAGAATCACTGTGTCCATGATAACACCCTTCAAATTTTATAA 
TTTTATTACGTCCTGTATAACCCCTAGCTAAACGAAGTGTGTCTAAAGTAG 
CTTCAGTTCCTGAGGAAACCATTCTTACTTTTTCAATTGAAGGTACACGGT 
CAATCACAAGTTCAGCAAGTTTATTTTCTTGAAGTGTTGAAGCGCCGAAGC 

45 TTGTACCTTTATCTACTGCTTCATGTAATTTGGATATAACTTGTTGATTTT 
TATGTCCCAGAATTAATGGGCCCCAACTTAGCACATAATCAATGTATTCAT 
TTCCATCAATATCATATATTTTAGATCCTTCACCATGATCCATAAAAATAG 
CTGGTGTGTCTACTGATTTAAATGCTCTTACGGGACTGTTAACACCGCCAG 
GCATTAATTTCTCAGCTTGCTCCATTGCTTTAATAGATTTTTCAAAACTCA 

50 TCATATGACCTCCTCAATTATCATCTATTTATCTAAATAACGACAGATATC 
TTTTGCAAAATAAGTAATTATTAAATCAGCACCTGCACGTTTCATAGATAT 
CATTTGTTCCATAACAATTTTCTCTTCATCTATCCAACCATTTAACGCTGC 
TGCTTTTGTCATACTATATTCTCCACTAACGTTG 

55 

Sequence 3401 

step . 1002el2 . cons . ok 

GGTAGGTGTATTATTTATATTTACTACTTTGAATATGGCTTCAATTTCTGA 
TAGAAAAGTACATGCCAAAATTAAAAATATAAGAAGATAGGTGAGAT7UVAG 
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TGGAAACAATAGGAGTATCATATATTTAAAAGAAGGTTCCCAAAAGTTAAT 
GATTATCAATCGAGGACCTATTGTAGATATAGATAATCAAAAATATATATT 
TGATTATTCAGCATGTAATTATCCTGTCGGTGTCGTAGAAGATCAAATATA 
CTATTTTAACGAAGACAATATCGATAAAGTCGTTTTTGAAGGTTATTCTGA 
5 TCAAGATGAGATGAGATTTCAAGAATTATTTAAAGAAATGAAAAATAATTT 
AGATGATGATATTCAACAAGGTATTGTTCAAAAGCAAGATAATTTAGGTTT 
AATTTAACATAAGTCATGATTATCTGT^GTTATCTATTATCGAATATTCAG 
AGATGATATAATATATGAGTCGCCTATCTCTCAGGCGTCAATTGACGAAGA 
GAGGAGGTGCATTAATTGCTAATCATTTTCGTTCACATCACGACCACAGTC 

1 0 CTC AGTTGTTGT ATTATTGCATTATTTACTC ATTGGATATGTAATTGC AAT 
AATAAATAGGCAACCATAGACACACATAAAAATCCCCACACTATTCCAGTA 
GTGTGGGGATTGGTGCATTAATTGCTAATCATTTTCATTATCTAGATTATA 
ACACTGGTTAGTAGAGAAATGCAAAGATAGACAAATAATTAATGACTGAGG 
ATAAAGATATAAAACAAACATACTTTCCTCATTTTGATACCATAGCCACCA 

1 5 ACTTTTTACTTGTGTTCTATTTCTAATATTTAATTCTTTCATAATTTCTTT 
TGTTGAAJUVTTCTGCTGCTTTCATTTCAACGGCTTTATACTTTGTTTCTAC 
TGAATAAGAAGCTCTTTATTTTGGAGCAACAGGTGTTTTAAGCTATCGAAA 
ATTTTTCCCATTTATTTATAATTTGTTTCAAAAAGGATACTTTGAAAATTC 
ACTAACAATTATTGGCAGTGGTTTAAATGAGTTAACTACTGATGAATTTAG 

20 AGAAAAAGTCAAAAATGCTATTCAAAATAATATTGAAAACTCAAAGGAAAT 
TGGTGCGTTTTTAAAACGTTTATTTTATAAACAACAGGACGCTAATAGTAA 
AGATAGCTACCAAAAATTTTTGGAAATGAGTTTAGAACTAGATGATAAGTT 
TGACCTAAAAGAAAATCGATTGTTCTATCTTGCAATGTCCCCTAAATTTTT 
TGGAGTTGCAACAAACCACTTGAAAGAGTCTGGATTAACGAACGTAAAAGG 

25 GGTGATGCGAATAATTATAGAAAAACCATTTGGTGATGATTTAAAATCTGC 
AAAAAATTAAATAATCAAATAAGAAAGTCTTTTAAGAAGAAAAAATATTTG 
GAATTGATCACTATTTAGGTAAAGAAATGATTCAAAATATTGAACGTCTAC 
GATTTTGAAACACTATATTTGAACCACTTTGGAATAATAA6TATATTTCCA 
ATATACAAGCAACTTCTTCAGAAACGATTAGTATAGAATATCGTGGTGGTT 

30 ACTACGAATCAAGCGGCGCCCTAAAATTCATGGTTCAAAATCATTTATTAC 
AAATGGTTTAATTACTTGCAATAGAACCTCCAATTAGTAGAAAAAGTAGTG 
ATATAAGAAAAGAAAACATTTCAAGTCTTAAAATCCTTGAAATGTTTTAAT 
CCAAATGAAATTAAAGAAAGTTTCGTTCGTGGTCAATATGATGGGGGAATG 
ATGAATAATGAGTTCGTTCCTGCATATAGAAATGAACCTAATGTAAATTCA 

35 CAATCTAATACTGAAACTTTTGTAGCAGGTAAAATAGAAATTGAAAACTCT 
AAATGGGCTAGTGTTACATTTTATATTCGTACAGAAAAAAGAATGAAAAAA 
ATCTATCCAAATCGTTATAGAGTTTAAAAAATCCAATGCAATTTTATTATA 
GTAATTGTGAACAGAATGCATCAAACTTATTAGTCATCAACGTACAACCTA 
ATGAAGGATTTTCTTTATGTGTGAATGGTAAGAAAAGTAATCAAAATAATG 

40 AAATGCAAAAAGTGAAGCTTTCTTATACTATGCCGATTAAAGATAAAATGA 
ACACAGTTGATGCATATGAAAATCTTATTTACGATACATTAATTGGAGAAC 
AAACAAAATTTACGCATTGGGAAGAATTAAAAATTCTTGGAAATTTATTGA 
TGATATTGAAAATGTATGGAAACAAGAATAGCCACAGTTTCCTAATTATGC 
CTTTGGATGCTATGGGCCTAAAGAAA6TGAAAAATTACTTAGTGAAGACGG 

45 ATTGAATTGGTGGAATAATTTGTAATATAAAAGACCTAAAGTTGTCGTGTT 
TTGAATGAGTAGTTAATCCTTAATTAATATGCAATAATCAATGTTATAAAG 
TTTATAAAGGGAGGACAATTAATGGATTTTATTAATATTACAGGTGCTTCA 
CAAAATAACTTGAAAAACATAGATGTAAATATCCCAAAACACTTAGTAACG 
GTATTTACAGGTCGTTCTGGTTCAGGGAAATCATCTTTAGTGTTTAATACT 

50 GTTGCTGCGGAGTCTGAACAGCTACTAAATGAAAGTTATTCTAGTTATATT 
CAATTTCATTTAAATCAACAACCCAGACCGAAAGTAAAGAAAATTAAAAAT 
CTTCCTGTAGCAATGACGATTAATCAGAAAAGATTCAATGGGAATTCTCGC 
TCCACGGTAGGAACAGTTTCAGATATATATGCTTCTGTTAGATTACTGTGG 
TCTAGAATAGGCGAACCGTTTGTTGGTTATTCAGATGCATATTCCTTCAAT 

55 AGTCCTAAGGGCATGTGTAAAACTTGTGAGGGATTAGGATATATTGAAGAC 
ATTAACTTAGATGAATTGCTAGATTGGGATAAGTCTTTAAATGAAGGTGCA 
ATAGACTTTCCTTCTTTTGGACCAGACAAAGAGCGTGGTAAAGCCTATCGA 
GATAGT 
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Sequence 3402 

step. 1002f 01 . cons .ok 

GTATTCAACAAGTTTTTTTAATTTATCTTTTCGTGTTAAATGTCTAAATCC 
5 TTTATCTAAACTTTTCATTTATGTACTCCTTTAAAATATCTTTATATTACT 
GTAGTAATGTAACCTTTTAACTTTCGCTTTAAATATCAACTGCTAACATTG 
TCTTGCACTCAAATAATTCAAAACTTTTCACAATTTTTGTAAAGCGCTTAC 
ATATTTATTATATATTCCTGGTCATGTCTGTTATATTTCATTGTATAAAAC 
GAATTTTCTTTGGTATGATACTAGTGCTACTTCAAAAAATGAAAGGATGTC 

10 ACACCTTATGAATATAGGTATAGATAAAATAAGTTTCTATGTACCCAAATA 
TTATGTAGACATGGCTAAACTTGCAGAAGCGCGCCAAGTCGATCCTAATAA 
ATTTTTAATTGGAATTGGTCAAACTGAAATGACTGTGAGCCCAGTGAATCA 
AGATATCGTATCTATGGGAGCCAATGCTGCTAAAGATATTATAACAGAAGA 
AGATAAAAAGAATATTGGTATGGTTATAGTAGCAACTGAGTCTGCGATTGA 

1 5 TAATGCCAAAGC AGCAGCCGTTCAAATTCACCATCTTTTAGGTATTCAACC 
CTTTGCAAGATGCTTTGAAATGAAAGAGGCTTGTTATGCAGCAACACCTGC 
AATTCAACTTGCCAAAGATTATCTTGCTCAACGCCCTAACGAAAAGGTTCT 
TGTCATTGCTAGTGACACAGCTCGTTATGGTATTCATTCTGGTGGTGAGCC 
TACTCAAGGTGCCGGTGCAGTTGCAATGATGATTTCACATAACCCAAGTAT 

20 TTTAAAACTTAATGATGATGCCGTAGCATATACTGAAGACGTTTATGATTT 
CT6GCGTCCAACGGGTCATCAATATCCCTTAGTTGCTGGTGCATTGTCGAA 
AGATGCCTATATCAAGTCATTCCAAGAAAGTTGGAATGAATATGCACGTCG 
CCATAATAAAACACTCGCTGATTTCGCTTCACTATGTTTCCATGTACCATT 
CACCAAAATGGGACAAAAAGCTTTAGATTCTATTATTAATCATGCCGATGA 

25 AACTACACAAGACCGTCTTAACTCTAGTTACCAAGATGCAGTTGATTATAA 
TCGTTATGTCGGTAATATTTACACAGGGTCCTTATATTTAAGTCTCATCTC 
TTTATTAGAAACACGTGATTTAAAAGGCGGACAAACGATTGGTCTCTTTAG 
TTATGGTTCTGGTTCTGTAGGCGAGTTCTTTAGTGGAACATTAGTAGATGG 
ATTCAAGGAGCAATTAGATGTTGAGCGCCACAAATTTTTATTAAATAATAG 

30 AATAGAGGTTTCTGTTGATGAATATGAACATTTCTTCAAACGCTTTGACCA 
ATTAGAATTGAATCATGAACTTGAAAAATCAAATGCAGATCGTGACATTTT 
CTATTTAAAATCTATTGATAACAATATTCGTGAATATCATATAGCAGAATA 
ATCTAAACTCAATTCTTTTGCTTTTTTAATTTTTTAAAATGAAAGAAACGA 
ATTTACTATTCATACTTGCCCCCTAAAAACAACGTCATATAGCTGTTTTTA 

35 GGGGGTTTTTAATCCCAAGATAGTGCATTTCATCTATTATAGACCTCTACT 
ATTTATATACTGTTTACCTACTATTCATAGATAAAAACGAATCCTATCCAT 
AGATCAACATTGACAAATAACTAAAGCAATTCTATACAATAACGTATCTCA 
AGGTTTCGTTGTTTTTTTAGGTCTGTATAAATGTGTCATATCTATTTCTTC 
CAATTTTAATAATTTTATTTTGTTGTCTAAAGTACCTCCGTATCCTGTTAA 

40 ATTTCCATCTTTACCAACCACTCGATGACATGGTATAAGTATAGAAATAGG 
ATTACTCCCAACAGCACCTCCTACAGCCTGAGCAGACATTGCTGGCTTATT 
CATTGCTCGTCCTACTGTTTTAGCAATCTCTCCATATGTTTTAAGTTCTCT 
ATAACGAATTGTCTGAAGTTCATTCCATACCTTCTGTTGAAAATCAGTGCC 
TTTTGGTGCTAACGGTACTTTTATCTCAGGATAATTCCCTTTAAAATACTC 

45 TCTTAACCACGCTTTAGTATCTTTAAAAACATCTAACGTATCCTTTGTTTT 
AATTCCTTTCAAATGTTGATTAGGGTATAAGACATGTGACAACGAGATGCC 
ATCGTTTGTAATTAATTGCAATAAGCCAACGGGCGAGTCATACATTGATTG 
ATACATATTTCTCCAACTCCTTAAATGTGTATATCATTCTAGTCATATCCT 
TATAAACTAACGCATTCTATTCTGAACATAATTTTATAATATAGATAATTG 

50 CTATATTTATGAAATTGGATAAACATTCGAATAGAATGTGATACACCAAAG 
AATTGATACGTTTTATTATCTTTAGCAACCTCTTTTAATTTCTTAGTATCT 
TCATGCTTAACATAATTATTTAATTGCTTTTCACTTGAATGACACGCCCAT 
GAATCAAAATAGAAAAAGTTTACAAAAATATAACAAATTGCTAAAAAAATT 
ATTATACCAATAACATAGAATACTTTTTTCATTAAAGTTCACCTAGTTTTC 

55 AAAAAAATTTTTATAATTTATGATAACTTAAAAGTGTATATTTTTTCTACA 
TAATATATTTAATTTTCAAGGAGATGTAAAAAAGTTGAAAAATTTCGCAAA 
ACTAATTTTGGTGGGCATTTTAGTATCGGGGTCAGGGATAGCGAGTGTACA 
AACAAATATAACTCACGCAAAAGAAAGTCACGATTCAACTCCTCAAAATAT 
TAAATTAGTGGGAACGTATGATACTTCTCAAGTTGATTCCAAAACGATGAA 
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ACAATTTAAAGAAATAGAAAAAGAAGATAATAATTTCCACAT 



Sequence 3403 
5 step. 1002 f 02. cons. ok 

CAGAAGGATCAATCACACCATAGCGATGTGTATCAGATTCTGGCACTTCTT 
GAACACCTATTACTGAATGGCCTGTTTCTTCATAAACATCCATTAATTGTT 
TAATAGCTGGTGTATCAGACTCTACAATGTCATCTCCTAATAACACTGCAA 
ATGGTTCGTTACCTATAAACTGTTTTGCAGTATGAATTGCATGTCCTAGCC 

1 0 CTTTTTGTTCTTTTTGTCGCACGTAAAAAATATTAGCTAAATCTGTTGAAT 
ATTGTACTTTTTCAAGTAAATCTGCTTTTCCTTTACTTTCAAGTACTATCT 
CTAGTTCTTTTTGATTGTCAAAGTGATCCTCAATTGCACGTTTATGCTTGC 
CAGTCACTATAATAATATCTTCTATTCCTGCATTAAAAGCTTCTTCTACAA 
TATATTGAATTGTTGGTTTATCTiW^TATTGGTAACATTTCTTTTGGCATCG 

1 5 CCTTAGTTGCTGGTAAAAAACGAGTACCTAAACC AGC AGCTGGTATGATTG 
CTTTTTTTATGTTTTTCAACGTATAAACTCCTTTAACAACCTAAGTTTTTG 
ACTACTTAATAAAGTGTATCACTATTTAAAATGCTTTCATAGTAATGTTTT 
TTCACCACAATCTTTTTCAATATTAACATTAACTTAATCAAGAATTGAGTT 
AGAAATTAAAAATACTCGAATAGAAATTAACTTTGTTTTAAATATATCGGT 

20 GACAGACATCTACTGAAAAACACTCGCTATATTTTTAAAATGAGAGCTTCA 
GGAATTTTTTCTCTATACAAATAAACTGATAAATAAAATAAATAAAATACC 
AGATACTGAAATAATTGTCTCTAATAACGACCATGTTAAAAATGTTTCTTT 
AACTGTCAGCCCGAAATATTCTTTAAACATCCAGAATCCAGCATCATTGAC 
GTGAGAGAGAATTACGCTACCTGCACCTATTGCAAGAACGACCAAAGCTAC 

25 ATTAACATCTGAATGTTCTAAAAGTGGTAACACAATGCCTGTTGTTGATAC 
GGCAGCAACTGTAGCTGATCCTAATGAAATCCTTAAGACTGCAGCTACAAT 
CCATGCTAATAAAATGGGCGACATGCTTGTTCCTTCAAACATCTTAGCGAT 
TGTATCACCTACGCCACCATCGATGAGCACTTGTTTAAATGTACCACCACC 
ACCGATGATGAGTAACATCATGCCGATTGGATAAATAGCATGCGTAACTGA 

30 TTTCATAATGTCTTCCATTTTCCGTTGTTGCTTCATTCCCATTGTAAAGAT 
TGCAAATATCACTGCGATAAGCATTGCAGTACCAGCAGTTCCTATGAAGTA 
AACAATTTGTTCGAATACATTTGTTGCTTCTTCGTGTCCTGTACTATGGAT 
GGTGACGGTGCAGCTGCAATGCGTTATACCGAAGCACGTATGACTAAAATA 
ACATTAGAACTTTTACGTGATATTAACAAAGACACAATTGATTTTATTGAC 

35 AACTATGATGGTAATGAAAGAGAGCCGTCAGTCTTACCTGCACGTTTCCCT 
AACTTACTAGTAAATGGTGCGGCAGGAATTGCCGTAGGTATGGCTACAAAT 
ATTCCTCCCCACAATTTAACTGAAGTTATTGATGGTGTGCTCAGTTTAAGT. 
AAGAATCCAGACATCACAATTAATGAGCTGATGGAAGACATACAAGGTCCT 
GATTTTCCTACAGCTGGTTTAGTACTAGGGAAAAGTGGTATTC6TCGAGCT 

40 TATGAAACAGGTCGTGGGTCAATTCAAATGCGTTCTCGTGCTGAAATAGAA 
GAACGTGGTGGTGGCCGTCAACGTATTGTCGTAACGGAAATACCTTTCCAA 
GTCAATAAAGCGCGTATGATTGAAAAAATCGCAGAGTTAGTTAGAGATAAG 
AAAATCGACGGTATTACAGATTTACGTGATGAAACAAGTTTGCGTACAGGT 
GTAAGAGTAGTTATTGATGTACGTAAAGATGCAAATGCGAGTGTTATTTTA 

45 AATAATTTATATAAACAAACGCCATTACAAACATCATTTGGTGTGAATATG 
ATTGCTTTAGTGAATGGTAGACCTAAACTAATCAATTTAAAAGAAGCACTT 
ATCCATTACTTAGAACACCAAAAAACAGTGGTTA6ACGACGTACTGAATAT 
AATCTTAAA/^GCAAGAGACCGTGCCCATATTCTAGAAGGTTTACGAATA 
GCACTAGATCATATTGATGAAATTATCACAACAATTCGTGAATCGGACACT 

50 GATAAAATTGCGATGGCAAGTTTACAAGAGCGTTTTAAACTAACTGAACGT 
CAAGCTCAAGCAATTTTAGATATGCGTTTAAGACGTTTAACTGGATTAGAA 
AGAGATAAAATAGAATCTGAGTATAATGAACTTCTAGAATATATTAAAGAG 
TTAGAAGAGATTTTAGCTGATGAAGAAGTAOTATTACAATTAGTTCGTGAT 
GAATTGACTGAAATTAAAGAACGTTTCGGCGATGAACGTCGCACTGAAATT 

55 CAATTAGGTGGTCTAGAAGATCTTGAAGATGAAGACTTAATCCCTGAAGAA 
CAAATTGTTATTACATTAAGTCATAATAACTATATTAAACGTTTACCAGTA 
TCTACATATCGTTCTCAAAATCGTGGTGGTCGTGGCATACAAGGTATGAAC 
ACGTTGGATGAGGACTTCGTTAGTCAATTGGTAACAATGAGTACACATGAT 
TATGTTCTGTTCTTTACGAATAAAGGTCGTGTATATAAACTCAAAGGTTAT 
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GAAGTTCCTGAGTTGTCACGTCAATCCAAAGGCATACCTATTATTAATGCG 
ATTCAACTCGAAAATGACGAAACAATAAGTACGATGATTGCAGTTAAAGAC 
CTTCAAAGTGAAGAAGATTATCTCGTATTTGCGACAAAACAAGGTATCGTT 
AAACGTTCATCATTAAGTAACTTCTCCCGTATTAACAAAAACGGTAAAATT 
5 GCAATTAACTTTAAAGAAGATGATGAATTAATTGCAG 



Sequence 3404 
step.lO02f 03 .cons. ok 

1 0 GGAC AGTATTCCTATTATC ACACCCATAATAATGGG ACG AATACTGGTTTG 
AACTTGTAATAATGAGATATTCATCATTATTAACGAAACACATATCACTGT 
AAACATATAAACACTCAAAGTTAGAGTTAAATGATTACTTTTAAAGAATAC 
AACATTCATAAGTGGAAGCAGATTAATTAATCCTATACTAGCTGCTGTACT 
TATTACTACCGTGATTTTAATTGATGCATTTGCATAACGATTCATATGAAT 

1 5 TTGATTATGTTC ACGAATTGCTTGAGTAAGTAATGGGATAAGAACGAAACT 
AAAAGTCGTAGTTACAATCAAACCCATTTGTATAAATGAAGCACCACGATC 
ATAAATGCCTTTTTGAATAATTGCTTCTTTAAAAGCAATACCGCTATGTTG 
TAATAAACGTATTATTGTAAAACTATCCACAACTTGCCATAAAATAACGAT 
AAGTTGACTCAATGCAAATATGGATATGGAAATAAACAACTGCTTCCATTG 

20 AATGGAAGTATTATTAAAGCGATAGCATAACTTAAGTTTAAGTGGTTTTTT 
AAGTAATAAATATAACATTGAACCTAAAAAACCAATCGAAGATGCCAATAT 
AGCTAATGCTCCTGCTTGATAAATAGACCAGTGTTTCATTGAAAACATAAT 
AATTGCAACAATGATTAAACTAACTCTAATTACCTGTTCTATAACCTGGGA 
AATTGCTGGTATGGTCATTACTTGTTTTGATTGATAAAATCCTCTTAACAC 

25 TCCTAAGACACCTATTAAAATAAAACTAAAACTGGCCATCTTTAACATGGG 
TGCTAAATTAGAATCGCCCATCCATCGGGTAATCATATTCGCAAACATAAA 
AAGCAAAATAAAGACGATAAAACCTATGCATTGTAATCGAAACATAACCCT 
TGTATAGACTTCATCGGATCGATTAACACCTATCACTTGAGTCACAGCACT 
TGGAATAGCATTCATAGATAAAATAACCCCTAGTGCTACGACAGGATATAT 

30 TTGTTGATAAGCATATAAACCGTCATCACCTAAAACATTTTGATACGGAAT 
GCGATAAATGGCACTTAGTATCTTCACAATAATTAATGCTTU^GGTAAGTAT 
AACGACGCCGTTAAACGCCGTCTCACTCTTAGTCTTCATCTTTAATCGCCA 
TACTTTCTTCAATACATCTAACTAAGAATTTCAAACTATCTAACCATTGTT 
TTGATTTTGTTAGCGTTACATTCATTGCATTATTTTGCACGCCAACTTTCA 

35 TTGCTCTACCAAGAGGTTGCGTCTGTTTAAACAATTCTTCTCCATTAATAT 
CTTCAGTCGCTTTAGGTGATAAAATGATTTGTATAGATTTGCCTTTGTCTT 
TTATCAATTCGACACCCGCATGTAGAGCGTGGACTTTGATTTCAACAATAT 
CTAATAATCGTTCGACTTCAATTGGATAATCATTAAAACGATCTATTAATT 
CATCTTTGACATCGAAAAGTTGTTCTTCAGTTTCTACITTTCGAAGTTTTT 

40 TATAAATCTCAATTTTAGCCTGTTCACTTTGTATATATTCAGCTGGTAAAT 
AAGCATCTAAGTGCAATTCTACTTCAATATCTGGTGCATCCGGCGATTCTT 
CTTTAATGCCACGTTTTTCGTTTACTGCTTCTTCTAACATTTGAGAGTATA 
AATCGAAACCAACCGAATCAATAAAGCCATGTTGTTGCTTACCGAGTAAAT 
TGCCTGCACCACGAATATTTAAATCTCGCATAGCGATTTTAAAACCTGAAC 

45 CTAGTTCGGTAAACTCCTTAATAGCTTGCAATCGCTCTTCAGCAGTCTCAT 
TTAACACTTTGTTAGCTGGATGTAAGAAATAAGCGTAACCAATTCTACTTG 
AACGTCCTACACGTCCTCTTAATTGGTATAGCTGGCTTAAACCAAAACGAT 
CAGCCTCTTCTATGATTAAAGTATTAGCATTTGGTACATCTACACCTGTTT 
CAATAATTGTAGTCGTTACTAAAATATCGTACTCGTGATTAATAAAGCTTA 

50 ACATTGTTTCCTCTAAATCACGTTCAGTCATTTGGCCATGTGCTACAGCAA 
TGTTAGCGTCAGGCATTAACCTTTGAAGTTGTTCTCTTTTTTCATAAATGG 
ACTGCACTTTGTTATACAAATAAAATACTTGTCCATCGCGAGATAATTCAC 
GCTCTAATGCCTCTTTAATAAAGTTCGTATTCTGTTCTAAGACATAAGT^ 
GTACAGGAAAACGATTTTCAGGTGGTGTTTCAATCACTGATAAGTCACGTA 

55 CACCTAACATACTCATATGCAATGTTCTTGGTATTGGTGTTGCAGTAAGCG 
TCAGTACATCAACGTTTTTTTTCAAAGTTTTAATGCGTTCTTTATGTCGCA 
CTCCAAAACGTTGTTCTTCATCAACAATAAGCAATCCCAAATCTTTATATT 
GAATATCTTTACCTAATAATTTATGTGTACCTACGACAATGTCAACATATC 
CTGATTTGAGCCCTTCTTTAGTTTCCCTTATTTCTTTAGCTGTGCGGAATC 
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GACTTACCAATTGAATTTCGACCGGAAAATCTTGCATACGTTCAAGTAGTG 
TTTCGTAGTGTTGTTGTGCAAGAATCGTTGTTGGTACTAAAAACGCAACTT 

GCTTACCATCCATGACAGCTTTAAACGCTGCACGGACAGCCACTTCAGTTT 
TACCATATCCCACATCACCACAAAGAAGACGGTCCATAGGTCTTGCTCTTT 
5 CCATGTCACCTTTAATTTCATCGAATGACTTACTTTGATCGGGTGTTAATT 
CATATGGAAAATCATGCTCAAATGCTGATTGCTCTGCAGTATCTTGGCCGT 
ATTGATAACCTACAGACATTTCACGTTCTTTATACAAATCAATTAATTCAT 
CAGCTATATCTTCAACACTTTGTTGAACTTTCGCCTTTGTTTTTTTCCATT 
CTGTCCCGCCTAATTTATTTAATCTAGGCGACTTATCTTCAGAAGCCACAT 

10 ACTTTTGAACTTGATCCATTTGATCAACTGGAACGAATAATTGATCAGTTC 
CTTCCACCTCGAACCTATTAACCTCGTCATCTTCGAGGGATCTTATAACCG 
AAGTTGGGAAATCTCATCTTGAGGGGGGCTTCATGCTTAGATGCTTTCAGC 
ACTTATCCCGTCCATACATAGCTACCCAGCTATGCCGTTGGCACGACAACT 
GGTACACCAGAGGTATGTCCATCCCGGTCCTCTCGTACTAAGGACAGCTCC 

i 5 TCTCAAATTTCCTACGCCCACGACGGATAGGGACCGAACTGTCTCACGACG 
TTCTGAACCCAGCTCGCGTACCGCTTTAATGGGCGAACAGCCCAACCCTTG 
GGACCGACTACAGCCCCAGGATGCGATGAGCCGACATCGAGGTGCCAAACC 
TCCCCGTCGATGTGAACTCTTGGGGGAGATAAGCCTGTTATCCCCGGGGTA 
GCTTTTATCCGTTGAGCGATGGCCCTTCCATGCGGAACCACCGGATCACTA 

20 AGTCCGTCTTTCGACCCTGCTCGACTTGTAGGTCTCGCAGTCAAGCTCCCT 
TATGCCTTTACACTCTATGAATGATTTCCAACCATTCTGAGGGAACCTTTG 
AGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCCCGCC 
TGACACTGTCTCCCACCACGATAAGTGGTGCGGGTTAG 

25 

Sequence 3405 

step. 1002 f 05 .cons , ok 

GTGGTATTAAACCAGATACTATCAAAGATATTGTTGCTGAAGATCCAGATT 
TAGTTATTGTTGGTGGCGGTATTGCGAATGCTGACGATCCTGTAGAAGCAG 

30 CAAAACAATGTAGAGCAGCTATTGAAGGTAAATAAGATGAGTGAATTTAAT 
AATTATCGTCTTATTCTTGAAGAGTTAGATTCTACTTTATCTCAAGTAGAT 
AATACAGAGTATGAACGTTTTGCTAATGATGTTATAGGTGCAGATCGCATA 
TTTACAGCTGGTAAAGGTCGTTCAGGTTTTGTTGCTAATAGTTTTGCAATG 
CGCTTAAATCAATTAGGTAAAAATGCCTACGTTGTAGGTGAGTCAACAACA 

35 CCTTCAATTAAAGAACATGATTTGTTTATTATTATTTCAGGTTCAGGTTCT 
ACAGAACATTTAAGATTATTAGCTGAAAAAGCACAATCTGAGGGTGCAAAA 
ATTGTCTTATTAACTACAAATGCGGAATCGCCAATCGGTAATCTTGCAGAG 
ACGGTTGTTGAATTGCCTGCAGGTACTAAACATGATGTTGAGGGTTCGAAA 
CAACCACTTGGTAGTTTATTTGAACAGGCTTCACTTATATTCTTAGATAGT 

40 GTTGTATTACCTTTAATGGATGCATTTCACATTAGTGAAAAAACAATGCAA 
GAGAATCATGCTAATTTAGAATAACTAGAATGAGAGATGAGCACTTTTGTC 
TATGATTAGACAAAGGTGCTTTTTTAATAAATGGTGCAACATTACTCCTTT 
AAGTGTATATTTATAATATAACTTTAACTATGAGGTGAACATCATGAATTT 
TGATAGTTATATTTTTGATTTTGATGGAACGCTAATTGATACAACAACATG 

45 TCACGTCAAAGCTACGCAAAGCGCTTTTAAAAGATTAAATTTAGATGAACC 
TACAGAACAAGCTATTTTACATACATATCATTTAAATTTATATAACAATTT 
TAAAGCGCTAGCTTCACATGAACTGTCTTTTTATCAAATAGAAAAATTAAT 
AGATGAATACAATCATTGTTTTAGCAACGATGAAATACATCAATCAAAAGA 
ATATACCGGAATAAGTGAAGCATTAAAATTTTTACATAACCAAAAGAAAAA 

50 AATATTTGTAGTGTCTAATAAAGAAATACTAACAACTCAAAAGTATTTAGA 
TTATCTCGGATTAAGCCGTTTTATAACTGATTCATTAGGTGTCTGTATTAA 
AAATGAAGACAAACTTCTTTGTGAAACGATTCAAAATTTGATACAGAAACA 
TCATTTAATGATAGGTAAAACCGTGTATATAGGGGACACAGCACAGAATAT 
CAAGAGTGCGAATCAAGCTCATGTGCAAACATGCGCTGTCACATGGGGAGC 

55 ACAATCTGCACACGAATTGTTGCATGAAAATCCTCATTATATTGTTAATGA 
TCCAGAAGAATTTTTAACAATTTTATAATATTTTCAATGGTAGGATGATAA 
TTCATGAGGAATTTTATCCTACTCTTTTTTTAAAACTATGAAGCAAGTATA 
ATGTAAACATAATTATATAAAGTGTTTACAAATGGTGATGAAGATGAAAAA 
AATAATGGAATATTTACAGCATTATATTAATCAATACCCGCATCGATTAGC 
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TTTAGTGTTTGAAGATCGACATCTAACGTATGGAGAATTAAGTAAAGAAAT 
TTATCAGGCTAGTATGCGCTATAAAGAAGTAAAATTAAACAAAAAAGTAGG 
TCTAATGGATGAACATCCTGTAAATAATATTATTAACTATTTTGCGGTACA 
TCAAAGAGGTGGAATTCCTTGCATTTTTAATCATCAATGGAGTAATGAAAG 
5 GATACATCAACTTGTAAAAAGTTATGACATACAATGGTTAATTAAAGATAA 
TCATCTTACCTCAAATCATGATAACTCAATTTATAATGATGAGGTTATCCC 
ACGTAATGTTATACATATAGGTTTCACGTCAGGAACTACAGGTTTACCCAA 
AGCGTTTTATAG/U^TGAACATTCTTGGATAGTTTCTTTTAAGGAAAATGA 
GAAATTACTCCAGCATTGTGAAGAAACCATTGTAGCACCGGGTCCTTTATC 

10 ACATTCACTTTCATTGTACGCATGTATTTATGCATTAAGTACTGGAAAAAC 
ATTTATAGGTCAAAAAAATTTTAATCCACTATCTCTTATGCGTCTTATTAA 
TCAATTGAACAAAACGACAGCAATATTTGTAGTGCCAACGATGGTACAACA 
ACTTATTTCAACTCAACGACATTGTTCATCGATTAAAAGTATTTTGAGTAG 
TGGTGCTAAACTTACATTGCAACAGTTTCAACAAATCAGAAATTTATATCC 

1 5 ACAAGCAAATTTAATAGAATTTTTTGGGACATCTGAAGCAAGTTTTATAAG 
CTACAATTTTAACCAATCATCTCCTGCTAATTCTGTTGGTAAACTTTTCCC 
TCATGTCGAGACACGATTATTAAATCAAGATGATGATGCAGTAGGATTATT 
AGCCGTTAGAAGTGAAATGGTGTTTAGTGGTTATGTTGGACAAAGCAATCA 
AGAGGGGGCATGGATTAAAACAGGCGACTTCGCTTATATTAAAAATCAACA 

20 TTTGTTTTTAGTAGGTAGAGAGAGTGATCGTATTATAGTTGGGGGGATTAA 
TGTATATCCAACAGCTATTGAAAGCTTAATTATGGATATTGAAGGCATTGA 
TGAGGCCCTTGTCATTGGTATACCACATGCTAAATTTGGAGAAATAGCGAT 
ATTGCTTTATTCAGGTAAAGTACAATTGAATTACCGACAAATTAAATCTTT 
TTTAATGAAACATCTTTCAAGACAAGAAGTTCCATCAAAATTAAAGAAAAT 

25 TGACCATATGATTTATACAGAATCAGGAAAGATTGCTAGAAAAGAGATGAA 
AAATAAATTTATTAATGGAGAGTTATAAAAATGAAACAACCTGTTATTATT 
GCAGCAAAACGTATAGCTTTCGGTAAGTATGGTGGCCGATTGAGGCATTTA 
GAACCTGAATCATTACTAGAACCTTTATTTAATCATTTTACAGATCAGTAT 
CCAAAAGTAATGTCTCTTTTGGATGACGTCATTTTAGGTAATACGGTAGGT 

30 AATG6GGGGAATTTAGCTAGAAAATCATTACTTGAAGCGGGATTAGATTTT 
AAAATACCTGGTATAACAATTGATCGTCAATGTGGCTCAGGTCTTGAAGCC 
GTTATA 



35 Sequence 3406 

step. 1002f 06 . cons . ok 

TGAAGTTTAAGTCTTTTAATAAATCTACAACAAAAACTTTACGTTCTCCTT 
TGAAAAATTTACCGATTGCTGGGATAAATTTTTCATGATCTTTTAGCATAT 
ATATCAGGAAGAAAGGCACCATTATTAATAAAAATACAGTTGAGATGAAAG 

40 ACGTAATATATGACACTGAATTAGATAAAATAGACGTTGCGCCATCTCCCA 
TGGATTTAACCGCTTTATTAATACGATGTGTGACATCATCTGGTAATTTAT 
CCATTTGTCTTAACGAGAAATTAATTAATTGTTCGGCTTCTTTTTGTAATG 
ATGGTGTTTGTTTAATTAAGTTATTAATATTGGAAATAATGATAGGTGCTA 
TAAATGATACCACAATAGCGATGATAGCTATTAATCCTATGAATATTGTTG 

45 TTATACTAGCCCAACGTGGAAAGCCCCACTTTTCTAATATGTTTTGAAAAG 
GTAAACATATGTAAAAGAGAAATCCACTAATTAAAAATGGAAGGAAGACTG 
AACCAATGATAGTAGCTATTGGAGCAAATACTTCATGCACTTCCATAAATA 
GTTTGATGAGTATGAACAGCATAATAAAAAATATTCCTGTTCTAAACCAAA 
CCTTATTGAACATAGATTGTGATCCTCCTATTTTATTCTTTTTCATACTCA 

50 ATAAGTATATATCAAAACTCAGTCATTAAAAAGAGTAAATACTTTTACAAT 
ATAACTGCATGCACTTTATTTTTAATAAAATGAAAAAACTAGTATGAAATT 
AATTTGTATACGAGCTTGCTATAAATTTAAAAGTATTATAAGTATTGATAC 
AATAGTATTAACATTACAAGGGAGCATCGGTTTATGTCGGAGTTTAAAGTA 
GGTAAGATTAATAAAAAAATAATACATAGTGATATTTTAAATAGAGATGTA 

55 AC ATTATCGGTTTATTTACC TG AGG ATTATACAAACTTATTTAAATATCAG 
TTGATTTTTTGCTTTGATGGTTTAGATTTTTTTAGATTTGGTAGAATACAA 
CGCATATATGAACAGTTACGTGAAGAACAATCAATACAACGTGCAATTATA 
GTAGGATTCCATTACGAAGATGTTGAAAAACGTAGGGAAGAATTTCATCCT 
TCAGGAAGTCGTTCTAATTTAACCATTAAAGCAATGGGAAAAGAAATTCTT 
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CCTTATATCGATGCGACATTTCCAACTTATAAAGTAGGTAATACAAGGTTA 
CTTATTGGAGATAGTTTAGCAGGAAGTATCGCTTTAATGACTGCAATGACT 
TACCCAACTATTTTTAGTCGAGTTGCGTTATTGAGCCCAATGTATAATGAA 
AATATTAAGAAAAAAATTGATACATGTATGAATAAAGGTCAATTGACGATA 
5 TGGCATGCCATTGGTTTAGAAGAAGCAGATTTTATTTTACCAACTAATGGT 
AAAAGAGCTAACTTTTTAACACCTAACCGTGAATTAAATCAACTGATTAAA 
GAAGATAATATTGAATATTTCTATAAAGAATTTAACGGTGGACATCATTGG 
AAATCATGGAAACCATTGCTAGGAGATATTCTCTTACAATTTTTAGGTGAT 
CCAATAAATGGAAAATATGTTTAATAAAGTAAGATAAATAAAATAGTATGT 

1 0 TAGCGAATGGTTGTTG AATGCGAAACACTATTAATTTCAG AAATGTAATTG 
TTTTCATGATAAAAGTAACGGTTTTCAAAAAGTTTTGATATAATAGGAATA 
AGTTAAACAAAGGAGGAATTTAAATGATTTTAGGATTAGCATTGGTTCCGT 
CAAAGTCATTTCAAGATGAGGTGAATGCTTATCGCAAGCGATATGACAATC 
ATTATGCTCAAATAATGCCTCATATCACGATTAAACCTCAATTTGAAATCG 

1 5 ATG ATC ATG ATTTTAATTTAATTAAAAATGAAGTGAAAAATCG AATTTCTA 
GTATTAAACCAGTAGAAGTACATGCTACAAAGGCATCTAATTTCGCTCCAA 
TCAGTAATGTTATATACTTCAAAGTTGCTAAAACAGAGTCATTAGATCAAT 
TATTTAATCAATTTAATACAGAAGATTTTTACGGTACAGCTGAACATCCTT 
TTGTACCACATTTTACAATTGCCCAAGGTCTAACAAGTCAAGAATTTGAAG 

20 ATATATATGGTCAAGTAAAATTAGCAGGGGTAGACCATAGAGAAATAATTG 
AAGAACTATCGTTACTTCAATATAGTGAAGAAGAGGACAAATGGACTATTA 
TTGAAACTTTTACATTAGGATAAAAAGTTCAAAATTGTAAAGTGAAATTGA 
ATTTACAGAATCATTATTGTTAAATACGTGAAGAGCGCTTAATCAACTAAA 
AACGCCAAATCCTATTGTGTTTCAGTGGGATTTGGCGTTTTTATATGAAAC 

25 ATAATTATATTGTATAGTTAATCATTCAAAAAGATTAATGTACGTTTTATA 
TTGAATATTCATATTTCAAATGCATATGAATGACGTTTCATTAATATATAC 
AACTAATCTTTATATAAACGATTTATAGATTTTAAGGTTTAACCCTTCTGT 
TTGTTTTAATAAAGTAATATCCGTAAAATACTGCAAGTGCAAGAAAAATCA 
TCGCTGAAAAGTAAAAGGTATTATTTAAATTATTAGTAAATTGAGTAATTA 

30 GACCTCCGACTAGTGGGCCTATCATTGAACCGAAGCCTTGAACACTGTTGA 
ACACGCCCCATGTTTCTTCCTGTTCGTTAGGATTAATATGCCCAGCCATAA 
AGGTATTCCAAGCCGGTAAGAGGATACCGTACATTAGCCCAATAAAAAGTC 
CTATGGCCCAAACTATATATATATTTGTAATTGTAGATAGCCCGAATATAA 
GAATTGTATATAGTATAAAGCCACTAAAAATAACTCCATACATAAACCCTT 

35 TGCTATTATTGTCGATGATTTTTGATAAAAATAACATAGAGAAAGCACAGC 
CTATGCCACCAATAATGATTGCTACTGTATATTCAACGGTTGATACTTTCA 
CAACTTGCGTTGCATATTTTGGAAGAATAGGTACAAGTGCTGCTATAGCTG 
CTCCTTGTAACAAGATACCCGGAAATAGAATAAGATGACG 

40 

Sequence 3407 

step . 1002 f 08 . cons . ok 

TAACGTGTGTCTCGCTATCAGTAATTACTGCAAAAGGTGTCACTTCAGAAC 
CAGTTCCTGATGTCGTTGGTATACATATAAATTTTGCGTTTTTAGGTTTGG 

45 TAATTTTATAAGTACGTTTACGAATATCTAAGAACTTTTGTTTTGCCCCAA 
AAAATGAAGTTTCTGGATGCTCAAAGAACATCCATATTGCTTTGGCTGCAT 
CCATTGCCGAACCGCCACCGAGTGCAATAATAGTATTAGGTTGGAAATTTA 
TAAACATTTCTAACCCCTTATAGACTGTATGAGTTGATGGATTAGGTTCAA 
CTTCGTTAAACACTTTGATTTGTGGTTGGTTTTCTCGGCGTCTCAGCACTT 

50 GTTCAACTATATCAGTATAACCAATATTAACCATTCCTGGATCACAAACTA 
TCATTACACGTTCAACATTATCCATCTCTGTCAAATACATAACTGAATTTT 
CTTCAAAATAAACTTTAGGTGGGAGTTTAAACCATTGCATATTATTACGAC 
GTTTTGCTATTGTTTTAATATTTAATAAGTCTACTGCGCTTACATTATGAG 
AAATAGAATTTCTACCATATGAACCACAACCTAACGTGAGTGAAGGAATGA 

55 GTTCATTATACATATTTCCAATTCCTCCGACAGCAGAAGGTGTATTTACCA 
ATACACGGCAAGCTTTCATTTTTAGACCGAATTTTTGTTGTAATTGACTAT 
CCTCGGTGTGAATTACAGCAGTGTGACCTAAACCACCAAATTTTAATATGT 
CTTCACAAATTTGTAGTGCATGTCCTGTTGATTTTGCAGTTACCATTGCGA 
GTACAGGTGATAATTTTTCACGTGATAAAGGATAATCTTTTCCAATTCCAT 
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CAATTTCTGCGACTAATAATTTTGTTTTTTCTGGAACACTAATTCCTGACA 
ATTTCGCTATATCTACAGCAGATTTACCAACTATATCAGGTTTAACTGCAG 
TTTTATCTTCATTCATGATGGCATCTTCTAATTGTTGTAGTTCATTTTTAT 
T/VACAAAATATGTTTGGTGTAATTTGAATTCTTTAACGACGTCAGTGTATA 
5 CTTCTTTATCAACAACCATGACTTGTTCAGAAGCACAAATCATACCATTAT 
CAAAAGTTTTAGAACCAATGATATCATTAACAGCACGTTTGATATGAGCAG 
TTTTTTCAATATAAGTAGGAACATTACCTGGACCGACTCCTAATGCAGGTT 
TACCTGTCGAATATGCGGACTTTACCATTCCAGAGCCTCCAGTCGCTAGAA 
CTAAAGCAATATCTTTATGATTCATTAATTGTTTAGTTGCCTCAATTGATG 

10 GCACTTCTATCCATTGAATACAATCTTTAGGAGCACercCOTTTGTTGCAG 
CTTCTAAAATGACTTTAGCAGCATATTTTGATGATTGTTGTGCACTTGGAT 
GAAATGCGAAAATAATTGGATTACCTGTTTTAATAGCAATCATTGCTTTGA 
AAATAGTTGTAGATGTAGGGTTGGTCGTTGGTGTCACTCCACAAATCACGC 
CTATAGGTTCAGCTACATACGTTAATCCTTTTTGTTTATCTTCACCTATGA 

15 TACCTACTGTTTTATTGTCTTTGATTGAATTCCATATGTACTCTGAGGCAT 
ATAAATTTTTGATAGCTTTGTCTTCATAAATACCTCTACCTGTTTCGTCGT 
AAGCTAGTTTAGCTAAATGCATATGCTGATCAACAGCAGCCATGCTCATCT 
GATGTACAATGTCATTAATCTCATGTTGTGATTTTTTAGATAGTTCTTTTA 
GAGCTTCTTGTCCTTTTTCTGCTAATGAATCAATCATTTGTGTGACTTCAT 

20 CTTTGTTTGATTCATATGTATTTTTTTTAGTTACAGATAACATATACAACC 
ACTCCTTAGATATTGTGAAATAATTCACAAACATTATAGTACAGCTCTTTC 
CAAATTAAAAGAAAATACAGACTATTAAAGAGTCAATTGTGATGTTTATAT 
GAACACATTTTTAACCTATATATTTAGTCGGGGTTGAGGTGTAAGACTTGA 
TGCAATGGGAACAAGACGTATTTGGTAGTTAAAAAAATCTATTTTTAAAAG 

25 AAGTAAATTTTAAAAAAATTATTGTAAGTATTCTTACTAAAAATAATGAGG 
GGTGTTGTAATGAAATTATCGTCTCACCCGACTAACGATGTTATCAGTTTA 
TAAAATAAAAAACTGAGACAATCATTTATGTCTCAGTAGCTCATGCTTTAA 
AAATAAATTGATAAAGAATAAGTGAAATTTATATTAATTATTTTGTTCAGT 
TGCAGTAAGAAGTTGTATCGCTTTGGGTAATACTTGAATCTCAATTGGTGT 

30 TTCTAAATTAATTTCGCCATCAATATCCACTTTCATACTAGGGTTTGTTGA 
GAGTGTGATGTGCTTACCTGATATGTGATCAATACCTTGTGTGATTTCGTT 
CCAATTCATACTATCACGTTTTTTTAATATATCATTCAATATATTTAGTGT 
TTGATCATTAAATACAAATGTGTTTGCTCTTCCATCTTGTGGCGATAAATC 
GGTTAGCGGAATTTGTCCACCACCTATATTGGGACCGTTTGCTATTAACAT 

35 CATCGAAGTATTGCCTTCTTTTGTTTCACCATCAACAGTCAATGAGAAATC 
AAATTTAACAGGATTTAATAACGTTTTAACGGTTGATCCAATATAGCTGAA 
TTTACCGAATATATCTTTAGAACCATCTTGAACATTCTCTGCATTTTGTAC 
TATTAAGCCAAGTCCAACGAAATTAAGTACATATAAGTCGTTCACTTTTAA 
CACATCATATGATTCAGCATGTGATGTTAATAATTGCTCACTAGCTGTTTT 

40 AAAATTAGGGTGCAGTTGAAGTGTTTTTGTAAAATCGTTAAAGGTACCACC 
TGGTATTACACCGATTGGTAAATTTAACTGATACTGCATAACGCCATTTAC 
TAGTTCATTAAGTGTACCATCTCCACCTAAAATAAATAAAACGTCTACATC 
AGAGCTATAATTTTCATTTTTAATAGATTTACAATATTTAATAATATCGCC 
TTTATTTTCACTGAGTTGAAGAGAGAGGTGTTTACACATTGAACTAAGAGA 

45 TTTTGCAACTTCTCCTATGCCATTATGTATATCTTTCAAACCGCTATGTTC 
ATTCAAACTTGGCACX3GTACACCATTAAAACGTTTAGCGAATGATATGAAA 
GTGGTACGTATGCCX3GGTACTACGACACCAAAGTATAAGCGTAATTTTAAT 
CGTGAAACATCACGTTGGGATTATTTAATTTCGCCAAATAGATATTCAACT 
GAAATATTTAGAAGTGCTTTTTGGATGGATGAAGAAAGAATATTAGAGATA 

50 GGTTATCCAAGAAATGATGTATTAGTTAATAGAGCCAATGATCAAGAGTAT 
TTAGATGAAATTAGAACTCACTTAAATTTACCTAGTGATAAAAAGGTTATT 
ATGTATGCTCCGACATGGAGAGACGATGAATTTGTGAGTAAAGGAAAATAT 
TTGTTTGAATTAAAAATTGATTTAGACAACCTTTATAAAGAACTCGGAGAT 
GATTATGTGATTTTATTACGCATGCATTATCTCATTTCTAACGCACTTGAT 

55 TTATCTGGTTATGAAAATTTTGCAATTGATGTTTCAAACTATAATGACGTC 
TCTGAATTATTTTTAATAAG 
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CGCCACTGTGCTCTAAAGATATACGTGTCGCGCAACTTGTTTCTGTCGTGC 
TTATTATGATTAGTGTGATATTCGTAATTTATAGAAGAGTTAAATACCAAC 
CAATAAAATATGAAAATTCAGGACCATTAACATGGCCGATTAAAAAGGCTA 
5 AGTGATTATTTTGAGAAGATTAAGTAAAGAGAAGCATCATCAAGTGAATCC 
TTTATGGCATATATATAGAAAAATTAAATTCGTAAAAGTGCTAAAGCAAAC 
GTTGATTATAGAAATTGCAAGATTTGTTCCTAGTATGAAACTTAAAAATAA 
AATATATAAAAAGCTTTTAAAAATGGATGTCGGGAATCATACTTCATTTGC 
ATATAAAGTGTTGCCTGATTTGTTTTATCCAGAATACATTTCGGTTGGCAA 

1 0 GAATAC AGTC ATTGGTT ATAATAC AAC AAT ATTAACTC ACG AAGTACTTGT 
TGATGAGTGGAGAGTAGGAAAAGTCATTATAGGCGATTACACTTTAATAGG 
TGCAAATACGACAATATTACCAGGAATAACCATAGGAAATCATGTTAAAAT 
TGGTGCGGGTACGGTTGTGTCTAAAGATGTTCCCGATTACAGTTTTGCATT 
TGGTAATCCTATGCAAATACAATTAGATTCAGGAGGTGACAATGAATGGCA 

1 5 CAAAAAGAAAATAACATC ATTCCAATGATTTTTG ATGAGGCTTTTTATCGT 
AAAATGGCTACCCAAAAGTGGCGACAACAGGACTACAAGAAGGCGGCAGAA 
TATTATGAAAAAGTGCTTGAGTTGTCACCTAAGGATTTTGATATTCAACAA 
CACTATGCTCAATGTTTAGTTAAATTAAATATCGGCAAAAAAGCAGAACGT 
CTATTTTATGAGAATATTGTTAAAGACTTTCATGTAGCAGAAAGTTTCTAT 

20 GAACTAAGTCAATTAAATATAGAATTAAACGAGCCAAACAAGGCATTCTTG 
TTTGGTATTAATTATGTAATATTGTCAGAGGATAAAGATTTTAGAGAAATG 
CTTGAAAAAACATTCGAAGTCACGTATACAAAT6AACAAAAAATTGAATTA 
GAAGCACAATTGTTTTCAACACAACTTTTATTTCAATTTCTCTTTTCGCAA 
GGTAGGTTAGAAGAAGCCCGAACATATATTTTGAATCAATCTTACGAGATA 

25 CAACAGCATAGGGTGATTAGGAATTTACTTGCAATGTGTTATTTGTATCTA 
GGTGAGTATGATAGCGCCAAAGCAATGTTTGAAGAACTTTTAAAGGAAGAT 
AATTCAGACGTGCATGCACTTTGTCACTACACATTATTACTTTATAATAAA 
AAAGAAACAGAAAAATATCAAAAATATCTTAAAATACTTAATAAAGTAGTA 
CCACTAAATGACGACGAAACCTTTAAATTAGGAATCGTATTGAGTTATTTA 

30 AAACAGTATCGTGCTTCTCAAAATTTACTTTATCCACTTTATAAAAAAGGT 
AAATTTGTCTCTATTCAAATGTATAATGCATTGAGTTTCAATTTTTATTAC 
CTAGGAAATAAAGACGAAAGTATTGAGATGTGGAACAAGCTCACTCAAATT 
TCTGAAGTTGATGTTGGTTATGCACCTTGGGTAATTGAGGAAAGTAAAACG 
GTATTTGAATCACGAGTGTTACCATTATTACTAGATGATAATAATCATTAT 

35 CGACTTTACGGTATTTTTTTACTTCATCAATTAAATGGAAAAGAAATACTA 
ATGACTG7VAGATATTTGGTCAATTCTTGAATCAATGAATGACTATGAGAAA 
CTTTATCTCACATATTTGGTACAAGGACTCACACTCAATAAATTAGATTTT 
ATACACAGAGGTATGCAAAGGTTGTATAATTTTAAGAAATTCAAATATAAC 
ACGTCTTTATTTACAGATTGGATTAATCAAGCAGAAATGATTATAGCTGAA 

40 AATGTAGATTTAGTAGATGTCGATAGATATGTAGCTGCATTTGTTTACCTA 
TCGTATCGTCGTTCTAGCCAACCACTTACCAAGAGGCAATTGATGGACGAT 
TTTAATGTTTCTAGATACAAACTGAATAAAGCAATTGAATTTATATTGAGC 
ATATAAATTTATGAAATGAAAAGTATGATATATAATGCGCATAATGATTAA 
TAAGTAGGAGGCATTTAAAATGACTGAAGTAGATTTTGATGTAGCAATAAT 

45 CGGTGCAGGTCCTGCCGGTATGACAGCAGCAGTATATGCATCrrCGTGCCAA 
TTTAAAAACTGTCATGATTGAACGCGGTATGCCAGGCGGTCAAATGGCAAA 
CACTGAAGAAGTAGAGAATTTTCCAGGATTTGAGATGATCACAGGTCCTGA 
CTTATCTACTAAAATGTTTGAACATGCTAAAAAATTTGGTGCGGAATACCA 
ATATGGCGATATTAAATCTGTTGAAGATAAAGGCGACTATAAAGTTATCAA 

50 TTTAGGGAATATAGTTGTTGATGTAAATGTGTGTCACAGTATTGTTTAACA 
GTATTTTCATAACGTAATAACAATGTGTCAATCATCGGTGTTGCAACGTTT 
AATGCTCTTCCAATCGCTTGAATTATCAAAGTACGATAATAATCTTCACTC 
GGCATGCGTGGTATATGTATGACTCCTTGTTCATCAGTATCAACATGTTTG 
TATGGTACGGCAGAAAAATCAAAATATGCACCTTGATCGTCCGGATTAGAA 

55 AACGGATCGATTAAAATTGCTGTATATCGCACATAAAGTAGATACTCTTGA 
TGAATAGCTGGTAAATTTTTAAAGTTTTCAATATCTACTTCGCGCATGGTC 
TCATAACGTATAGGGTAGTTTTCTTTCACCATAAACTTTAGAAGATTGACC 
GAAGGTACCTTTAATTTTTTTAATATCATCATCATTTCTTGCCACATTAAT 
CGCATTTCGTGTATTAAGGTCATTGTGATTGGACCCTCTGGAAATAGCTTA 
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TATACATATACTGGTACTTTCGTCCCTTCAAATACCGCCTTTAATGAAAAT 
TGATTCATAAACAATGGTGGGTGTACATAAAGTGAACTATTATGTATCTCC 
GCATGTAGTGGTGTGTCCATCGTTGTTAATTCTATATTCAAATAGTCAAAT 
AAAGACTTAAGCTTACACAACGTCATAGACTGAGATTGAGTCGAACCTACG 
5 AATAATTTTGATTTAACTCGTGTGGTTAGGACACAATGTGGTTGTGCTTTA 
TCAAATATTCGGGTATCGCCTAGATAAGTGGAAAATGAAATCACTTCACCT 
TCACATTGAACATCTGATAGTAATTGCTTAACAAGCATATGTGATCCTAAT 
GTTGGTGAGACCAAGATGATTTGCTTAATACGCTTTAATGTGGACTTAGAT 
AATTGCTGTAATATCGGTCGATACGCATCGGCAGTACATGCTAAAATCACC 

1 0 ACGTC ATAATATTC AGTAATATCTTTAAC ATCTTTAAAAAAATGTCTAACC 
GTAAACTTACCTGAAAAACACTGATGTGCATCATTTTGAGTCATTACTGAA 
AAAAAGCCGTCACGTTGATAAGCATCAAAGACTCTCTTAGATTTGGTTGAT 
GCATGAACGCGACTCACCATATCAACTATATGTTCTCCATGTAAATGACAT 
AGTCGAGCAAGTTGAATAGCGACCGGTCCACTACCAACCATTAATATTTTA 

1 5 GAC ATGTTC AACACC AACTTTTGATGCCACTTTACAATAAAGTGCAATATC 
AAAAATCTGCTTCGGTCGTGTTTGTAGTTCTTCACATCGCCACTGTGTTTC 
GTCAGTTGCTTGAGATGGATAATTAAACAATGCTTTTATATCATCACCGTA 
TCGCATAGCTACAACAACTTCATCATTCGTCAATGTGTAAAGTTCATTCAA 
AATGTCATATTTGATAGGTATCGTCGAACTGAATATAATATGTGTCACAGA 

20 TTGG 



Sequence 3409 
step.l002f 12 .cons .ok 

25 AAGATGTAAAAAAACAAAATGATTATTTAGATAAAATTTTACCTGAAGCTT 
ATGCACTTGTACGTGAGGGGTCAAAGAGAGTATTTAATATGATTCCTTATA 
AAGTACAAGTAATGGGTGGTATTGCTATACATAAAGGTGATATTGCAGAAA 
TGAGAACAGGTGAAGGGAAAACATTGACTGCAACCATGCCGACGTATTTGA 
ATGCTTTAGCTGGTAGAGGTGTACATGTTATTACAGTCAATGAATATCTAT 

30 CAAGTTCACAAAGTGAAGAAATGGCTGAACTATATAACTATCTTGGCTTAA 
CTGTAGGTTTGAACTTAAATAGTAAGTCAACTGAAGAAAAACGTGAGGCTT 
ACGCACAAGATATCACTTATAGTACGAATAATGAACTTGGGTTTGATTATC 
TTAGAGATAATATGGTGAACTATGCTGAAGAGAGAGTAATGCGTCCTCTAC 
ATTTTGCAATTATTGATGAGGTCGATTCCATATTGATCGACGAAGCAAGAA 

35 CACCTTTAATTATTTCTGGTGAAGCGGAAAAATCTACTTCTTTATATACTC 
AAGCAAATGTTTTTGCAAAAATGCTTAAAGCGGAAGATGATTATAATTATG 
ATGAAAAAACCAAAGCTGTACATCTTACAGAACAAGGTGCAGATAAAGCTG 
AACGTATGTTCAAAGTAGATAATCTTTATGATGTTCAAAATGTGGAAGTGA 
TTAGTCATATTAATACAGCTTTAAGAGCTCATGTTACTTTGCAACGCGATG 

40 TTGATTACATGGTCGTTGACGGTGAAGTATTAATTGTTGACCAATTTACTG 
GACGTACAATGCCTGGACGTCGTTTTTCTGAAGGTTTACACCAAGCAATTG 
AGGCTAAAGAAGGTGTAGCAATTCAAAATGAGTCTAAAACGATGGCATCCA 
TTACTTTCCAAAACTATTTCAGAATGTATAATAAGTTAGCGGGGATGACTG 
GTACAGCGAAAACCGAAGAGGAAGAATTTCGTAATATCTATAATATGACAG 

45 TTACCCAAATTCCAACAAACAAACCTGTTC7ACGTAAAGATAATTCAGACT 
TAATTTATATTAGTCAAAAAGGAAAGTTTGATGCGGTAGTTGAAGATGTTG 
TAGAAAAACATAAAAAAGGACAACCCGTCTTACTAGGTACTGTTGCTGTTG 
AGACTTCTGAATATATTTCAAATTTACTAAAAAAACGTGGTGTCAGACATG 
ACGTATTAAACGCTAAAAATCATGAACGCGAAGCTGAAATCGTTTCAAACG 

50 CGGGGCAAAAAGGTGCAGTTACAATTGCCACAAATATGGCTGGACGTGGAA 
CAGATATTAAACTTGGTGATGGTGTTGAAGAGTTAGGTGGACTTGCTGTTA 
TTGGTACTGAGCGTCATGAATCAAGACGTATTGATGATCAATTACGTGGAC 
GTTCAGGACGCCAAGGTGATAGAGGAGATAGTCGTTTTTACCTATCTTTAC 
AAGATGAATTAATGGTACGTTTTGGTTCAGAACGCTTACAGAAAATGATGA 

55 ACCGTTTAGGAATGGATGATTCAACGCCAATCGAGTCGAAAATGGTATCTC 
GAGCTGTAGAATCAGCTCAAAAACGAGTAGAAGGTAATAACTTTGACGCGC 
GTAAACGTATTCTAGAATACGATGAAGTTTTACGTAAGCAACGTGAAATTA 
TTTATAATGAGCGTAATGAAATCATTGATAGTGAAGAAAGTTCTCAAGTCG 
TTAACGCGATGTTACGTTCTACATTGCAACGTGCGATTAATCATTTTATTA 
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ATGAAGAAGACGATAATCCTGACTACACGCCATTTATCAATTACGTTAATG 
ATGTGTTCTT6CTGAATTATTAATGTTAAATGCTATCCATGTGCCATTACA 
CACCGGTAAAATTGAAGAAATGACACGTGTACTTCGTGAAAAATAAGCATA 
ATATATTTAAGCGTCTGGATTATCATCCCCAGACGCTTTTAAAATGTTTAG 
5 AGAATTATCAAATTTTATTTGAATGCTTTCGCTAAAACAAAGATAAACATA 
TCTTTCGAGGAGTTTTATCTAGTATACGTACATTCTTCCCTCAACAAGCTA 
AAAACATAACTATCTGAGAATTCCCCTTGTAATCGTTCATTACTTCTCAAT 
ATACCTTTGCGTGAACCCAAGTCTAATTGGTATAGCTCTGCTTTTCTTATT 
TTTAGTAGACATTCGTATTTCTATACGATTAATATCATACACTTCATATAC 

1 0 ATAGCGAATTAACGCTTTAGTAC ATTTAGTC ATAATACCTTTCTTTTGAAA 
GTCTTCAGCTAAATAATAACCAATTGAAGTTGTTTTATTAACTAAATCTAA 
GTAATGCAATCCTATGACTCCAATCAATTCTTTATTACTCCATATTCCACA 
ATGAAATCCATTACCATCGATAAATTGTTGCAACGCCGAATGGATAAAGTG 
TTTACTATCTTCAACTTTCTTCGTATGTTCAACAAAAGGCAGAAATTCAGC 

1 5 TAAATAGTC ACGATTGCT ATCTACTAATTTAAATAACTGTTCGGCTTCTCG 
TTCTTCTAATATTTTTAATTTAATATTTTTATTTATTTGATAACTAAACAT 
ACAAAATCATCCTTTCTTAGAATTTTCTGTCATTTTAGTATACGCGAGTAT 
AGGATTATGTCATAAAACTTGATTACTAGAATTCGATATTCTTTAAGTTAC 
AAGTGTGTATGTAGTATTTCTCTTGCTTCTATTGTCTTATACTTACAACCA 

20 ATATTAGAGTGAAAAAGCTTAACTTAAGATAGATAATCATCAACAACACAA 
ACGTTAAAAACACTGAGAAAACTTTTTTAAAATATACTTCTCAATATTGAT 
TAGTAAAATCCAAAATATGAATAATAAGTAGACATTAAACAATCTTTATTT 
TATTCTATAAATAATATAAAATAAATTATTAGATTAATACATTATTTTTTT 
ATTTCTAATTTAAGTTGAAATCGTATGTAATCGGAGGAAACCATAATGATA 

25 TATTCTATCGGTAAGAATTTAGGTAATAAATTAACAGGTATAGAACAAGCT 
ATGATCAATAGATTAAAGCTATTTAAAGATAATTTAGTCCCAAATAAACTC 
ATATTCACATCTTGGTCACCACGTTTATATATGCATGCACATTCGTTAAAC 
ATCGATTCAAAAGATATTTTCAGTCTTTACGATTTTCTACAAGATAGTATT 
AACTTTGAGAAAAAACATATTGATTGGATAAATTATTGGCAAAATATATGT 

30 AATTATACCTTAAAATTCGTTGAAAATACGAATGATATTAAAATATACGAT 
AACGACACATATAAAATGTATGTGCATTTTGTTGATTCAAATTATCAAACT 
TTAGACTATATTAACCATTTTGATATACAACAACGTAAAATTCGAAGAGAT 
TTTTACGATACAAGAGGCTTTTTAAGTTGTAGTAGAATTTTAACCTCTCAA 
C 

35 

Sequence 3 410 

step . 1002g01 . cons . ok 

TAACCGTAGGTATAGTGGTTAGCTTTACATAATTCAATGACTTTTTGTGTT 

40 ACGGTATCATTTTTATAGGTTTTGTTTTTCCATCGGTAATATGTTGATTTA 
GGTATGTTTAATACTTCTAGTATCAATTTGATTGAATATTTTACTTTTAAT 
TGATCCACTAAATCTATGACTACTGTTGGTACCACTTCCTTTCCAATGCCT 
TGTACTTTTTTAAAATATCCAATTCTATATCTTTTCTCTTATTTTCTAATT 
TTAATTGTTCTACTTCTGACAGCTCTTCTAATCCTTTACCGTAGGTATATT 

45 GTTTACCAACTTGTTGTGAAAATCTATAACTTTCCCCATTTCGATACCATC 
GCCACCAAGTTTTCACTTGCGTCCTATTTCTAATATTTAATTCTTTCATAA 
TTTCTTTTGTTGA/^TCCTGCTGCTTTCATTTCAACTGCTTTATACTTTC 
TTTCTACTGAATAAGAAGCTCTTTTCATAGAAAAAAACCTCCGTATGATTC 
ATTTTAATATGAATTCAACGAAAGTGTTTTTATATAATTCCCACAAATTGG 

50 GGTCAGTCTAATAATGAATTTCGTGATTGTAGGCTTTTTATTTTATATATT 
CATTAATTTCTCTTCTTGACGGATACGTACAATTAAAAAATAAGCATATGG 
AACTAATAAGAGTGTTGTGTATGTAGCATTTGTTAGTAGTAATACACCAAT 
TAATTCGGGAATGATATTTAAAAAATAGTTTGGATGTTTCGTTACTTTATA 
TAACCCTGACTTAATAATAGGGTGATTGGGCAGTATAAATAATTTCAATGT 

55 CCAAATACGACCTAATGTCTTAATAACTATAAATAGCATGATATAAGCGAT 
GATCAATATGATTAAGCCAATGCCGTTAAGTAGACTAAATGTATCCTTACG 
AATGAATGCTTCTATAGCTGCACTCATGTAAATTAATACATGCGTAATGGC 
TAGATATTTTGAATTTTTCACACCATATTCCACCGCGCCCTCTACCTTTAG 
CTGTTTTGCGTGTTGCATAGATATCTTTAAGCTGATGAGTCGAATACAGAA 
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AAAGATAAATAAAATAGTTAAAATCATGATGTCCTCCATTGTTTGTTATAT 
ATATATTCTTTTGCTCGTACAATACAAAGTATTTTATATAGAAAATGTTTC 
AAATGCCATTAACTATTTATTAACTTTAGCACCAACTCATAACTCTTCTAA 
CACCTCACATATTAAGTTCAGTTTTTCGGATAATTTAATACTTTTAAGGAT 
5 ATTAAGCGCTTACATTGATGTGATATATTTTTTTTAACGAAGATGATAGGG 
GTGTTAGGGATGAACTTTTTTGATATTCATAAAATGCCAAACAAAGGGATA 
CCATTAGCTGTACAACGCAAATTATGGCTCAGAAACTTTATGCAAGCGTTT 
TTTGTCGTATTCTTTGTTTACATGGCGATGTATTTAATTCGAAACA^ 
AAAGCGGCACAACCGTTATTAAAAGAAGAAATCGGATTAACAACATTAGAA 

1 0 CT AGGTTATATAGG ATTAGCGTTTAGTATTACTTACGGTTTAGGAAAAAC A 
ATACTCGGTTATTTCGTTGATGGGCGTAATACGAAACGTATTATTTCCTTC 
TTATTAATATTATCTGCGATTACAGTACTTATTATGGGATTTGTATTAAGT 
TATTTCGGTTCTGTGATGGGGCTATTAATTGTATTGTGGGGGCTTAACGGT 
ATATTTCAATCTGTGGGTGGGCCTGCAAGTTACTCAACGATTTCAAGGTGG 

1 5 GCGCCTCGAACAAAGCGCGGTCGTTATTTAGGCTTTTGGAATACATCACAT 
AACATTGGTGGTGCTATTGCTGGTGGTGTCGCACTTTGGGGCGCGAATACA 
TTTTTCCACGGTAATGTGGTTGGAATGTTTATTTTTCCTTCCGTCATCGCT 
TTAATCATTGGGATTGTGACATTATTTATTGGTAAAGATGATCCAGAGGAA 
TTAGGTTGGAATCGTGCCGAAGAAATTTGGGAAGAGCCTATCGACCAAGAA 

20 AACATTGATTCTCAAGGTATGACTAAATGGGATATCTTTAAAAAATATATC 
CTTGGAAATCCTGTGATTTGGATTTTGTGTATCTCTAATGTTTTTGTATAT 
ATCGTGCGTATTGGTATTGATAACTGGGCACCGCTATACGTATCAGAGCAT 
TTACATTTTAATAAAGGTGATGCGGTGAATACTATTTTTTACTTTGAAATA 
GGTGCATTAGTAGCTAGTTTATTGTGGGGCTATATCTCAGATTTATTAAAA 

25 GGTCGTCGTGCGATTGTAGCGATTGGATGTATGTTTATGATCACCTTTGTT 
GTACTCTTTTATACCAATGCAACAAGCGTGACAATGGTCAATATTTCTCTA 
TTTGCATTAGGCGCTTTAATCTTCGGTCCACAGTTACTCATTGGTGTATCT 
CTGACTGGCTTTGTTCCTAAAAATGCAATTAGTGTCGCTAACGGTATGACA 
GGTTCATTTGCATATCTATTCGGGGATTCAATGGCTAAAGTGGGTCTGGCT 

30 GCAATCGCTGATCCAACACGTAATGGTTTAAATATTTTTGGGTATACGTTG 
AGTGGTTGGACAGATGTCTTTATTGTATTCTATGTAGCTTTATTCTTAGGA 
ATGATATTATTAGCCATTGTTGCTTATTACGAAGAAAAGAAAATTAGAAAA 
TTAAAAATTTAAGATTAAGTGAATTTAAATATATACTCCCTACTATAAATT 
TTATCATTATGGTAGGGAGCTTTTTCGTGATTGATTTGCACTATTTTGATT 

35 GCTTATTATAATCACTTGGTGACATATGTAAATATTTTTTAAAATGATAGC 
AAAACATTTTATACTCAGAAAAACCTACTTTTTCAGCAATTTCAT 



Sequence 3411 

40 step . 1002g02 . cons . ok 

TTAAGTTGTTGTAAGTTATCTACTTGTTCAATCCATTCGCTATCTATCGCT 
TTCAAAATTGATTTTTGAATGAAACGAAGTTTTAAATATGAATCAGGAGCA 
ACTTCTAAACGATTGTTAAATTGTTGAGTAAATTGTTGTATTAAGAATTGT 
ATGATTTCTTCATCATTTTGCATATTAATATTTGATACATCTTCATCGAAG 

45 ACAAAACTTAAGTTTTCGTATATATAATTCACAAGTGCACGTTCACTACTT 
AAGTCAAGATTTTTAACGTCTTTTGTAAACACATCTCGTGCAAGCTGTTCA 
AAATTAAAATCATCAAAATCGCTTGCTTCAAGTATGTGATTACGTTCAGCA 
TAAATTTTATCTCGTTGAACACTAATACTTTTTTCGAATTCATTTGCCATT 
TCTCTATTTTTCATAGCAGTCTCTTCAGATACACGTTGCGCTTTATTAACA 

50 ATTGACTTTACACGTTTTTTAAAGAGTGCACTACTTTCTAGTTTAGATGCA 
TCCATCGTTTGGAGGTTTTTATTTTCTGCCAAGTTAGAGTTACTCCAACGT 
TTTACTAAATCATCATCAAGTGATACAAAAATCTGTGAATATCCAGGATCT 
CCTTGGCGACCTGAGCGTCCTCTTAATTGACGATCAACACGGCTATTATCC 
ATATGTTCATTAATAATCACTGCTAAGCCACCGATATCATGAACCTCTTTT 

55 GATAACTTTATATCAGTTCCACGCCCTGCCATACTTGTAGCAACAGTAACT 
GCAGATAATTGTCCCGCCTCAGCAATCATTTGTGCCTCTTTAGCTACATTT 
TGAGCGATTAATAAATTGTTGGGTATATCACGTTTAAATAACTCAGCTGAA 
AAATATTCTGCCGCTTCTGCAGTACGTGTAATTAGTAACACAGGTTGTTGA 
GTTTCATGTATACCAATCACTGTCTTTAAAATTGCATCGTTCTTTTTGTCA 
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CCATTAGCAAATACTCTATCAGGTCTATCATCTCGTTCAATCGGACTGTGA 
GT 



5 Sequence 3412 

step . 1002g03 . cons . ok 

GTATTCGGTAATATCATCTGTAAGATTTTTAATTTCTTGTCTTGCATTTTT 
CATTGCAACAGTGTGATGGCCAAAATTAAACATAGCTCTATCATTATCACT 
ATCACCAATAACTAAAGTTTCTTCTTGCTTAATACCATAGTGATCAATCAT 

10 TTCCTTGATACCTGTACCCTTATCTGTATGATATGGCATCGTTTCAGCATT 
ATAACGCGAAGAATTAGATACGGTAATACCTAAATTTGATTGGTTTTCAAC 
AAGCTGGTTGCGAAAATCAGTTATTTTATCTAAGTTGGGACTAAATAAATA 
TATTTTAGAAAAGTGTGCATCAGGTAAGGTATCTCGCCAATCTATTTTTCC 
TTTAATTGCCTCTCTTCGCGATGACCACTCACTTTGACTTACAGCGTCAGG 

1 5 TGGCTCTATAGTGGAAATCATTTCTTTCATCCAATCTTCATCTTCTTTAAG 
AGATATACGATTACTTTCAAAAGGAAAAACTTCATAATAAATATGTTGTTT 
TTTAGCCAATTCCACAATTTTCTGTACTCGTTCTAAAGTTAAACTATGTCT 
AAACAAATTATCTCCATGAATTTCACCTGAAGTTCCATTCGAACTGATAAT 
ACCATCTACAGTGAATCCATCAGGAACAAGCTGACTGATTTCTGAATAAGA 

20 TCTCCCAGTAGCAAGGAAAACTTTATAATTTTGCTCTCTCAATTCATTAAT 
CACTTGTTTAGTATATTCCGAGGCTTTATTATTTTCATGTAAAATCGTTCC 
ATCCATATCTAAAAAAATTGCTTTTATATTTTTCATATTATCCTCCATTCA 
AGACTCTAACTTATACACTTAAATTGTGTCACTTATCATACTAAAGTGCAA 
CGGCTTTGATTACACTTTGAACATACTATAACGTTGTCTTCTCTTATTTAC 

25 AATGACATCCCTATCAAAAAAATTGTCATCCGATAAAGATCTCATAACATC 
TTAAATATTAACCTTGTTAAATATTTAAAAGTCTTTAAGTAAATCTTTTTC 
AAAAATAACGGCATTTCTTCTATTATATGTGTTACAAGTATGACAGAAAGT 
TGTGAATCATTTAAAATATGCTTACCAAATGTTTTAAATAGTTGTCTCTTT 
ATATAAAATCCATTCCAGTTGTAGGTTTATCCAATACTAGTTAATTTAGGG 

30 TACTCATTAAAGTCATTTAACAACGGCATTAAACCGCTTCGTTAAAATTAA 
TTGAAGGATAAATCTAAAATTAATTCAAAAATAAAGAGTAGCCATTTAGGC 
TACTCTAAATTATAAACATTTTACTCTACTGTAACCGATTTGGCTAAGTTA 
CGAGGTTTGTCAACATCTAAATCTCGTTGTAACGCAGCATAATATGAGATT 
AATTGCATTGTCACTACAGATACTAAAGGAGTTAATAATTCATGTACTTGT 

35 GGAATCACGTATGTGTCTCCTTCTTTATTCAAACCTTCCATTGAAATCATA , 
CAAGGATATGCACCACGTGCTACTACTTCTTTCATATTTCCACGAATTGAT 
AGATTAACGTTTTCTTGTGTAGCTAAACCTATAACAGGTGTGCCATCTTCA 
ATCAAAGCGATTGTTCCGTGCTTTAATTCCCCACCTGCAAATCCTTCAGCT 
TGAATATAAGAAATTTCTTTTAATTTTAATGCACCTTCTAAACTAACATTA 

40 TAATCAATTGTTCGTCCAATGAAGAATGCATTACGAGTAGTTTTTAAGAAA 
TCCGTTGCAATTTGCTCCATCTTAGGTGCATCGTCAACAATTGTTTCAATA 
GCTGTAGTAACCTTAGCTAGTTCTCTTAATAAATCAACATCGGTTTCACGA 
CCATGATTTTTAGCAACAATTTGAGATAAGATAGATAAAACAGCAATTTGC 
GCTGTATATGCTTTTGTAGATGCGACTGCAATCTCAGGTCCAGCATGTAAA 

45 AGTAATGTATGATCCGCTTCACGTGATAATGTTGAACCAGCAACATTAGTA 
ATTGTTAATGATTTGTGACCTAACTTATTTGTTTCAACTAATACAGCACGA 
CTATCAGCTGTTTCACCAGATTGTGAAATATAAATAAATAGTG6TTTTTCA 
GAAAGAAGTGGCATATTATATACAAATTCAGAAGCTACATGAACCTCAGTA 
GGTACACCTGCCCATTTTTCAATAAATTCTTTACCAACCAATCCAGCATGA 

50 TAACTAGTACCAGCTGCAACGATGTAAATACGATCAGCATCTGCTACATCA 
TTAATAATCTCTGAATCGATTTTTAAATTACCTTTTTCATCTTGATATTCT 
TGAATAATGCGACGCATCACTGCAGGCTGTTCATGAATTTCTTTTAACATG 
TAATGATCATATACGCCTTTTTCTGCATCAGCAGCATCTATTTCTGCCGTA 
TACGTATCACGTTGTTGAATGTGCCCCTCAAGATCTTTAATTTCTACTGTG 

55 TCTCGCTTAACAATAACTATTTCATGGTCATGTATCTCTTTGTATTGGTTT 
GTAGTTTGTAACATTGCTAGAGCATCAGAAGCAATAACATTGAAACCTTCA 
CCTACACCTACTAAAAGCGGAGACTTGTTTTTAGCCACATAAATAGTATCT 
TTATCATTATCATCTAATAATCCTAAAGCATATGAACCATGTAATAATTTA 
ACTACTTTTGTAAATGCATCTTCTGTAGCTAATCCTTGTCTAGAAAAATAA 
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TCTACTAATTGTACAATAACTTCCGTATCAGTTTCTGATGTU^AAAGTGACA 
TCAGATAAATATTCAGCTTTTAATTCTTCATAATTITCAATTACACCATT^ 
TGAACTAATGTAAAACGTCCAGATGTTGACTGGTGTGGATGTGAATTTTCA 
TAATTTGGAACACCATGTGTCGCCCAACGTGTATGACCAATTCCTAACGTA 
5 CCATCCTCATCACTATTATCTGCAAC 



Sequence 3413 
Step.l002g04 .cons.ok 

10 ACTGGTCCACACGTACACTTCCAAGTTGAAAGAGGTCGCCATGATGACATC 
ACAAACAGAGGGACAGTAAACCCTGCTAAATGGCTCAAAGGTCACGGTGGT 
GGAAAAGTTGGTGGTAGTGGTTCTGTAAACGCACGTAGAGCAATTCAAAGA 
GCACAATCTATTTTAGGTGGACGTTATAAATCGTCTTATATTACCGAACAA 
ATGATGAGAGTTGCCAAACGTGAGTCTAACTTCCAATCAGATGCGGTTAAT 

15 AACT6GGACATCAACGCACAAAAAGGAACGCCTTCTAAAGGTATGTTCCAA 
ATGATTGAACCATCTTTTAGAGCATATGCTAAACCAGGACACGGAAACATC 
TTAAATCCAACTGACGAAGCTATATCTGCTATGCGTTACATTGTAGGTAAG 
TGGGTTCCTATTATGGGGAGTTGGAGAAGTGCATTTAAACGTGCTGGAGAT 
TATGCTTATGCTACAGGCGGGGTTATTAACACTGCTGGATTATATAATTTG 

20 GCAGAAGATGGATACCCTGAGATAGTAATCCCTACAGATCCAAGCAGACAA 
TCAGATGCGATGAAATTGTTACATCTTGCTGCAAGTAAAATTAGTGGAAAT 
AACAGAAATAAACGACCTAACCAATTACGTACACCTAATGTTACTAGTAAT 
ACAGTTGATAATGCAGAATTACTACTACAAATGATAGAAAATCAACAGAAA 
CAAATAAACGTGTTAATGGAAATAGCACGAAGTAATAAAACTATTGAAAAA 

25 CAACCGAAAGGTTTTTCAGAACGCGATGTAAGTCAGGCACAAGGTTCAAGG 
TTAAGACTCGCTGCTTATAGCCAGGGAGGTTTATAAATTGGAAAATAAAAA 
AGTAAAAATATTTAACGATCATTTCGAAGAAACACTAACGGATATTCCTCA 
TCTTAAGTTTCACAGAGGATAAATATCACATACAGATTTTGCTATACGCAA 
CATTGTTAAAACATGATAACTCAATCCACTCGCGAAATTATGATGATGAGA 

30 ACTAGCTGCTGGATAAGTGAAAAAACGATCTTGATACTTTTTAATTAAATG 
TCTAGTAATGCGTTGTAAGTTAGCATTTTCAATATCTAACATAAAATGCGA 
TAATTCCTCTTGTATTTCTGTAGGTGACATTGGCGCACCGTCAACAAAGTC 
TTTAGTTGACACTTTGTCTTCAGCTTGTGCAAGACGTATTTGATGTATTTT 
CATCTGTTTGCGTCCACGATAATTGATGATATCACCTTTGACATGAACTAT 

35 TGTTTCTGGTTTTAAAGTTTGCATATCATCTTTCGTAGCAGTCCATAATTT 
AGCTTCAATATCACCACTTTTATCTTGCAGAAATAGTGTCATGTAATCTTT 
ACCCTGAGCTGTAACACCTTGAGTTGCACGATGGATCAAGAAAAAATGATC 
CACTGAGTCGCCGGGATTTAATTTTTCTACATTTCTCATTTTTTCCCGCCT 
TCCTGTAATTTATTTAAAGTTAATACTTCTTTAGCCGGTATGACATGATCT 

40 TTTGTACAAGTAAAATAAAGTATTTGATAATGTTCTGATAGTTCTCTCAAA 
TATTTCAACATACGTTCTTTACGATATTTATCAAAATGAACAAATGCATCA 
TCTACAATCACTGGGAATGGATAATACGGTTTTAATACTTTAATAAGACTA 
ATACGTAAAGCCACATATAATAATTCTTTTGTAGATTGACTCAACTCAACT 
GGCTCAAATACTTGTCCGTTAGAATGCTTTACATGTATTTTATGATTTTCA 

45 GTATAATGAATCATATTGTAAGTACCATTTGTTAAATTTTTAAAAATAGAT 
ACAGCTTCATTAATCACTTGTGGTAGACGCTTATCTTTTATTTGCTTGATA 
TGTTCTTCCACTAAAGCTTGCATATAACTTAAGCTTGCCCAATCCTTAGCA 
ATATCGTTAAGTCTATTTTTTAAGCTATAATATTCATGTCTTAATTGTGCT 
AAAGTTCTATCTGTTTCCATATGATTAATCTGAGCATTTAAATCACTAACT 

50 TCTGCTTGCATTTCTAAAAATTGATCATTGTAATCGTCAACTTGTTTAGCC 
AATCTATGATCTTCTTCTTCTAGTTGAGCAGTAGTTTTCTCACTTAATTGC 

gaactcatttcataagtgtaattttgattttctaaatattgatttaAatca 

TTAAATCGATTTAAATCACTTTGATATGTTTCAAAATGATGATGATGTGTA 
TAATAATCTTCTTCATTATCTACATCAACATAATCAAATAATTGCTTAATC 
55 ATTTGATTCGTTTCACTTAAGCGTGACTTCAATTGCTTTAATTCATTATTG 
AGTAATTGCGTTTCAGTTTGATTTTTATTCCAAGCCTCATTTTGTTCTTCT 
GCATTTTTTAACCACTGTTTCACATCATGGAAGAAAGACATCTGATTAAAT 
AGTTTTAAATGTGGTTCAGTTACAGATTGTGCACGTTCATAAAAGTGATTG 
ATATCTTTTAGCAGATTATTGCGTTGTTGATTTAAATCAATAATGTATTTA 
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TCATGCGCTTTAATTTGACCAATGGTTGAGATACTTTCAACCACTAATTCA 
TCAGAAAGTTTTTCTGATAAACGTAACTCTTTTTTTATTTCAATAATGCTA 
TGTTTTAAACTATCTAATCGCTCATTTGCAGTCGTTAATGATTGATTGATA 
TATTGATGTTTTTCTTCAAGTATTTTTTTATTTTTTT^ 
5 TGTTCACGAACTTGTTGTTGATATTCTAAATCGAAGTCAAGATTATATTCT 
TTTTCAAGTTGCGTGAGTTGGTTTTCTAAATCATTAATTTCCTGACTTATT 
GCTGTGCTATAATCTACTGCTTTAGATCTAGAAAAAATGATACCTACTACA 
AAAATCACAGTTAATAGAGCAAATATTATACCAAAAATAAGATTTGCAGTG 
AAAAAAGAAAAAATAGAAAGTGCTGCTGATAGAATAGTCAAAACAATAAAT 
1 0 CCTATTCTCAAAAACTTTTGTCTTTTATTTTTTTGTGTTTGTTCTTCTTC A 
AAAGTTTCTTTTAACTTTTCATACAAGTTCTCTTTTTCATGTAATTCTAAA 
ACTTGTTGTGTATATTCCTTTTTCTTTTCAAAGGTTTCGTCAGGAACAAGC 
TCATTCTCAACTTGATTAATCTCATTAGAATTAGAGTTTCTTTCA 

15 

Sequence 3414 

step . 1002g05 . cons . ok 

ACTTTGTTGATTCAACAGTCTCTGTAGCTAAAGAAGTTCAAAAGTCCGACG 
ACAATCTAGGTATTGCTATAGATGCCTATGGTGCAGGTAGTTTTATTGTAG 

20 CAACTAAAATTAAGGGAATGATTGCGGCTGAGGTTTCAGATGAGCGTTCAG 
CTTACATGACACGTAGCCATAATAATGCACGCATGATTACTATGGGAGCTG 
AAATTGTTGGAGATACGCTTGCTAAGAATGTCGCAAAAGAATTTGTCAATG 
GTCATTACGATGGTGGACGTCATCAAATCCGCGTAGATATGTTAAATAAAA 
TGTGTTAATAGGAGGAATAAAAATGAAAATTGCAATAGGTTGCGATCATAT 

25 TGTTACTGATACAAAAATGGAAGTTTCACAACACTTAAAATCACAGGGACA 
TGAAGTGATAGATGTTGGAACTTATGATTTCACACGTACACATTATCCGAT 
TTATGGAAAAAAGGTAGGAGAAAAAGTTGCGAGTGGTGAAGCAGATTTAGG 
TGTATGTATTTGTGGTACTGGTGTAGGAATTAGTAATGCTGCAAACAAAGT 
ACCAGGTGTTAGAACTGCTTTAGTTAGAGATATGACATCAGCGCTTTATTC 

30 TAAAGAAGAGTTAAACGCCAATGTTGTAAGTTTTGGCGGTAAAGTAGCAGG 
TGAATTATTTATTTTCGACATCGTTGATGCATTCATTGAGGCAGAGTACAA 
ACCTACTGAAGAAAATAAAAAATTAATTGCTAAAATCAATCATTTAGAAGC 
ACATAAGAATGACCAAGCTGATCCACATTTCTTCGACGAGTTCTTAGAAAA 
ATGGAATAAAGGTGAATATCACGATTAAGGAAGATGTCCAATGATTTTAAC 

35 ATTAACTTTAAATCCCTCAGTTGATATATCTTATCCTTTAGATCAGTTTAA 
TTTAGATACTGTTAATAGGGTATCTCAAACAAGCAAAACAGCAGGCGGTAA 
AGGATTAAATGTTACTAGAGTATTGTCTGAATTTGGTGAAGACGTAATAGC 
AAGTGGTTTTTTAGGTGGAGCATTAGGTCAATATATTGAAGAACAAATAGA 
AACTACTCGTATTAAACAAGCATTTTTCAAAATCAAAGGCGAAACACGAAA 

40 TTGTATTGCAATACTACATGAAGGACAACAAACTGAAATCCTTGAAAAGGG 
TCCCACGATTGAACTTAAGGAATCAGAGGAATTCAAGTCACATCTATTAAA 
ACTTTTCAAAGAAACTGATGTGGCTGTTATGTCTGGTAGTCTGCCCAAAGG 
ACTTAATACTGATTATTATGCGGATATTGTGAGATTAGCAAAAGAACAAGG 
AATTTTGACCATTTTAGATAGCTCTGGTCAATCACTTGAGGAAGTTCTTAT 

45 TAGTAATGTGAAACCTACAGTAATTAAACCCAATATAGATGAATTATCACA 
ACTTTTAAATTACAAAGTAACCAATGATATTAAAGAATTGAAAGCGGCAGT 
AAGTCAGCCAATATTTAATGATATTGAATGGATTATTGTTTCATTGGGCAG 
TGAAGGTGCTTTTGCAAAACATAATCAAAAATTTTATAAGGTGAATATTCC 
CAACATTAAAGTAGTTAATCCTGTTGGGTCAGGAGATTCCACTGTAGCAGG 

50 AATTGCTTCTGGACTCATTCATCAACAAACCGATGAAGAGTTATTAAAAAA 
AGCAAATGCATTCGGAATGCTAAATGCAATGGAACAACAAACAGGTCATAT 
TAATACAGATAAATTTGACGAAATATTCAAACAAATAGAAGTTATAGAGGT 
GTAATTTATGACAAAATCACAACAAAAAGTGTCATCAATTGAGAAATTAAG 
TAATCAAGAAGGTATTATTTCAGCTTTAGCATTTGATCAACGTGGTGCATT 

55 AAAAAGAATGATGGCAGAACATCAATCTGAAACACCAACAGTTGAACAAAT 
AGAACAATTAAAAGTACTTGTTTCTGAAGAATTAACTCAATATGCGTCTTC 
AATTTTATTAGATCCAGAATATGGTTTACCAGCATCAGATGCTCGAAATAA 
TGACTGCGGACTATTACTTGCATACGAAAAAACTGGATATGATGTGAATGC 
GAAAGGTCGTTTGCCAGATTGCTTGGTAGAATGGTCTGCGAAACGTTTGAA 
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AGAGCAAGGGGCCAATGCAGTTAAATTTTTACTTTATTATGATGTAGATGA 
CACAGAAGAAATTAACATACAAAAGAAAGCATATATTGAACGAATTGGTTC 
AGAATGTGTTGCCGAAGATATTCCTTTCTTCTTGGAAGTTTTAACATATGA 
CGACAATATTCCTGACAATAAAAGTGCAGAATTCGCTAAAGTTAAGCCACG 
5 TAAAGTTAATGAAGCAATGAAGTTATTCTCTGAAGATCGTTTTAATGTGGA 
TGTACTTAAAGTTGAAGTACCTGTGAATATGAATTTTGTGGAAGGATTTTC 
AGAAGGAGAAGTTGTTTATACTAAAGAAGAAGCTGCACAACATTTCCGTGA 
TCAAGATGCAGCTACTCACTTACCATATATTTATTTAAGTGCAGGTGTATC 
AGCAGAATTGTTCCAAGATACATTAAAATTTGCGCATGATTCT6GTGCGCA 

1 0 ATTCAATGGTGTTTTATGTGGACGTGCCACATGGTCAGGAGC AGTTAAGGT 
ATACATTGAAGAAGGAGAGCAAGCTGCCAGAGAATGGTTGCGTACGGTAGG 
ATTTAAGAATATTGATGATTTGAATACAGTATTGAAAACAACAGCTACATC 
ATGGAAAAACAAATAATGTAAGGGAGGATATTCAAATGAATAGAGATGAGG 
TACAATTACTCGGATTTGAAATTGTTGCCTATGCTGGGGATGCACGTTCAA 

1 5 AATTATTAGAAGCTTTAAATGCTGCTAAAGATAGTGAATTTG ATAAAGC AG 
AACAACTTGTAGAGGAAGCGAATGAATGTATTGCTAATGCACATAAAGCAC 
AAACCAATCTTCTAGCTCAAGAGGCTAAAGGCGAGGATATCGCATATAGTA 
TCACTATGATTCATGGTCAAGACCATTTAATGACAACATTACTTTTAAAAG 
ATTTAATGAAGCATTTAATTGAATTATACAAAAAAGGGAGCTGACCCCTAT 

20 GAATAAATTAATAGCATGGATAGAAAAAGGAAAGCCATTCTTTGAAAAAAT 
ATCACGAAATATTTATTTAAGAGCGATTCGTGATGGATTTATTGCTGCTAT 
TCCAATTATCTTATTCTCAAGTATATTTATTTTAATTACCTATGTACCAAA 
TGTGTTTGGTTTTACTTGGAGTAAAACTATGGAAGGTATATTGATGAAACC 
CTATAACTATAC 

25 

Sequence 3415 

step . 1002g07 . cons . ok 

TTTCATCCTCAATATCCATACCAAGTAGCTCTTCTATTAAATCTTCGTGCG 

30 ATACGATAGCATCCGTACCTCCAAATTCATCTAAAACAATCGCTAAATGTT 
TTCGTGAAACGGTCATTTTACGCAATACCCATTCTGCCCTATTATGTTCAT 
TTACAAATAAAGGGCTTGATGCATAATTAGTAATTGCGTCCTCTTTATTTT 
TACTCCAAGCTAATAAATATTTTGAGTGGAATACGCCGATGATATCATCTA 
TATTTTCATCATATACTGGATATCTTGTATATGGGTTATTCATCACTGTAT 

35 CATAAGCTTCGTCATATGTTACTTCCTTTGAAAAAGCTACAACATTAATAC 
GAGGCGTGGTATCAACATCCTTAACCTTCAATTGTTCAAAGTCCATAACGT 
TTTGAAGTCGAGTATTTTCTATCTCATTAAATGCACCTTCTCTACCCGCAA 
TATTTAATAATGTACGAATTTCTTCTTTAGAAAATCTTTTTTCAACAGGTT 
GGCCTCGAGATAGTAAATGATTAATACCATCTGTCATCTTGTTTAATAAGA 

40 TTGTAATGGGCTTGAGTACAATAACACATATATGAATGATAGGATACACAA 
GTTTTGAAATTTTATCAGGAAATGTTGCAGCAATTGATTTAGGAATCACTT 
CAGAAATAAGTATAATCACAATTGTTACAATTGCTGATGCTATTCCCACAT 
TCACTCCAATATCAATCGCTAAGATTGTCACGAGTGTAGGTAAAATAATAT 
TCGCTACATTGTTACCGATTAATATAGTAGTGATAAATTCACTTGGTTTAT 

45 CTAATAACTTTGTTAAACCTTGCGCCTTACGATCTCCCTTTTTAGCTTCAG 
TTTGAAAClTAGTGCGATTCGCGGCTGTTAATGCTGTCTCACTACCrGAAA 
AGAAAAATGAAACAAATATTAATAGTATAATTGCAATAATCATGATGCATT 
TCTCCTTAAGATTTAGTCCGTATTAATACATTATTTCCCGAGAAATGAATT 
ATCAAACTTATCATTTATATTGAATTAACACAACACACGAATAAGAATGAA 

50 CTAATGCAAAATATTTTTATAGTTTGCCTATAATTAATAAAGAAGATATCG 
ATTTAATTATTAATAGAAGTTTAGAACAAAATGTATTTCTACTAACTGAAT 
ACATTCAAAAAGGTAATAAGAATAAAGCTATACAATTAGTCAATGATTTAA 
TCATTATGAAAGAAGAACCTATAAAATTACTGGCATTAATTACTAGTAATT 
ATAGATTATATTATCAATGTAAAATTCTTAGTCAAAAAGGCTATAGTGGGC 

55 AACAAATTGCAAAGACTGTAAATGCACACCCTTATAGAGTAAAACTAGCAC 
TCAATCAATCCCGACATTATAAACTAGAAAGTTTGTTTAACATCATAAACG 
CTTGTGCAGAGACTGACTACAAACTAAAATCATCTTATATGGATAAACAAC 
TCATTTTAGAATTATTCATACTATCTCTATAAAAAATTTGAGGTAGGAGAT 
ATAAAATAAATCCTATCGACAATAGAAATATTGGTTGTAAAGTAATAAAAA 



929 



wo 01/34809 



PCT/USOO/30782 



AGAGCCTAGGACAATTCGTTTTGTCCTAAGCTCTGTTTTAATAATCATTAT 
TTATTAGCTGACATAAGTTTAGATTTAATACGGTCAGCTTTATTAGAATGG 

ATTAAATTACTTTGAGATGCTTTGTCAACTTGTTTGATAGCAAATCTTAAT 
AATTCATCTTTGTTTTCAGCATCAGTAGAGATAGCTGTTTTAGCTCGTTTC 
5 ACAGCTGTACGCATAGCGTTTTTCTTAGAAATATTTCTTTCTTCAGCTGTT 
TCAGTTGTTCTTACACGTTTGATTGCAGATTTAATATTTGGCATTACTGTC 
ACCTCCTAAAAGTGATCATAACTTATCAAAATTTATTTGATTACAACAAGA 
AATATTTTATCAAATGTCTTATATTTGTGCAATCATTTATTTAAAAGTAAC 
TTCATTTGAATTTATTACTTTTTAATTGTTGGAATGAACTACTGAACTAGA 

1 0 TATAATAGTTATGTTAGAATTTAC ATTTAATTAATTATGTC AAAG ATATAT 
TGCATTAAAGTAATAAAAACGTTACTATATCAAAGATACATAGAAGTGACA 
GGTTATAAAGATGAAAGCGAGAAGGATAAAATGGATAAGCAAGAACGATAC 
AATAGAAGAGAAAATATTAGAAATTTCTCCATTATTGCTCATATAGACCAT 
GGTAAATCGACATTAGCTGATCGAATTTTAGAGAATACAAAATCAGTTGAA 

1 5 ACTCGAGAAATGCAAGATCAATTACTTGACTCTATGGATTTGGAAAGAGAA 
CGAGGCATCACTATTAAACTAAATGCTGTTCGATTAAAATACGAAGCTAAA 
GATGGAGAAACTTACACATTTCATTTGATAGATACACCAGGACATGTCGAC 
TTTACATATGAGGTTTCTCGCTCATTAGCTGCATGTGAAGGTGCAATTCTT 
GTAGTTGATGCTGCCCAAGGTATAGAAGCACAAACCTTAGCAAACGTTTAT 

20 TTAGCATTAGATAACGATTTGGAACTTTTGCCAGTTGTTAATAAAATAGAC 
TTGCCTGCAGCTGAGCCCGATAGAGTTAAGCAAGAATTAGAAGATGTTATA 
GGTATAGATCAAGAAGATGTAGTACTTGCAAGTGCTAAGTCAAATATAGGT 
ATTGAAGAAATTTTAGAGAAAATAGTTGATGTTGTACCAGCACCGGACGGT 
GATCCAGAAGCCCCACTTAAAGCACTTATCTTTGATTCAGAATATGATCCA 

25 TACAGAGGAGTAATATCTTCAATTCGAATTATTGATGGTGTTGTTAAAGCT 
GGAGATAGGATTAAAATGATGGCTACCGGTAAAGAATTTGAAGTTACAGAA 
GTCGGAATCAATACCCCTAAGCAACTACCGGTAGAAGAATTAACAGTTGGT 
GATGTGGGTTATATTATCGCAAGTATCAAAAATGTTGATGATTCTAGAGTA 
GGTGACACTVATTACTTTAGCTGAAAGACCTGCTGACAAACCGTTACAAGGA 

30 TATAAAAAGATG/^TCCAATGGTATTTTGTGGTCTATTCCCTATTGACAAT 
AAAGACTATAATGACCTAAGAGAAGCTTTAGAAAAATTACAA 



Sequence 3 416 

35 step . 1002gl0 . cons . ok 

TGTAATACCAATAAGTTCAGTTCCTTGGAATGAGAAGCCTGCGACTAGAAA 
TACACCAAGAATTGATAGCAAGCTACCTCCTAAATTCCCACCTAGTATTGG 
CCCATCTCCTTTTGTAAACGTATCAAATCCTACAAATTCTCCACCCATAAT 
CCCTAAAATAGTTAAGATACCAATTCCTATAAATATGATGACTGTAACTAC 

40 TTTGATAAGTGCGAACCAATATTCACTCTCTCCATATACTCTAACGGATAA 
AGAATTAAGCGCGAAAATAATAATAAGGAAAATACAACTCCAGACCCAAGC 
TGGTATACCTTGCATAGGGGAGCAATATTGTATAACTTGCGCTGCAATAGT 
AACATCAGCTGCTACTGTTATCACCCAGTTGAACCAATAATTCCATCCTAA 
AGCAAAACCTAGTGAAGGATCGACAAAGCGTGTAGCATAAGTACTAAATGA 

45 ACCTGACACAGGCAAATATGTCGCCATCTCCCCTAACGACGTCATTAGAAA 
GAACACCATCGCTCCAATAACTGCATAGGCAATCAAAGCACCCAATGCACC 
TGCGTCATGTATAGCTCCACCAGAAGTCATGAATAGACCAGTTCCTATACA 
TCCCCCAATGGCAATCATTGAAATATGTCTATCTTTTAATCCTCTTTGAAC 
ACCGTCATTTTGTTTATTATTCTTTTTCATAGAAGCTCCCTTTCCTTCATA 

50 AGTAGAGGCTAACTTGTTACCACATCAATGCATATCATTGTTGTAGATAAA 
TTTGGTATAGATGATATAAAAGTAAAATAATATTAATATCATTTTACTTTT 
ATATCACTTGTCTTTATATCTGTATCAATAATGTAAAAAGTGGTTACATAC 
CTTTAAGATATGCAACCACTAATGTTTATCTATATAAGGTAGCACATCACA 
TAAAGTGACAGTACAACTTCTATTCGAAAGTTGTCCCAACAATGCTAATTT 

55 ATGGAAATAACATCATTTCGGCATTATTACCTTTCACTTATTCAACACATG 
TCCCATACGACTTCTGATAAGTTAGTCATTAATAATGCAACCTCTAACTCT 
ATTTATTATATTAATTTGTAATATACATGATATACTTTATATTTTCAAGTT 
TATCTATGCTTTTTCAAACGGATTTCATCAATAACATTCCAAATAAATTCA 
TCTTTTTCAACTGATTCTTGTTTTTCAGAACCATATTTTCTTACATTTACT 
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TCTTGATTCTCTACTTCTTGGTCACCTACAACAATCTGATAAGGTATTTTT 
TTCATTTCAGCTTCACGAATTTTATATCCCATTTTTTCATTACGGTCATCA 
ATTTCAACGCGGACACCTTGCGATTTTAGTTCATCTTGTAAAAGTCTTGCA 
TAATCATAATGTAAATCTATATTTACAGGAATAATTTCAACTTGCATAGGC 
5 GCCAACCAAGTTGGAAATGCACCTTTTGTTTCTTCTGTTAAAAATGCAACA 
AAACGTTCCATAGTAGAAACTACACCACGGTGTATAACTACAGGACGATGT 
TGTTCTCCATCTTGACCAATGTACGTTAAGTCAAAACGTTCTGGTAAAAGA 
AAATCAAGTTGTGCTGTTGATAGAGTTTCTTCTTTTCCCATAGCTGTTTTT 
ACTTGAACATCTAACTTAGGTCCATAGAATGCTGCCTCACCAATAGCTTCT 

1 0 TC ATAAGTTAAACCTAATTC ATCTGATGCTTCTTTAAGCATGGATTCAGCT 
TTTTCCCACATTTCATCATCATCAAAGTACTTATGCTTATCTTCAGGATCT 
CTATAACTCAATCTGAAGCGATAATCTTCAAAACCAAAATCTTTGTACACA 
TCTTGAATCATATTAACTACACGTTTAAATTCTTCTTTAATTTGATCAGGT 
CTAACGAAAATATGGGAATCATTCAATGTCATTCCTCGAACACGTTGTAAA 

1 5 CCTGATACTGCACCACTTGCTTCGTAACGATGCATAGTACCCAATTC AGC A 
ATACGTATAGGTAATTCGCGATAAGAATGAGGTTTGTTTTTATAAATCATC 
ATATGATGTGGACAGTTCATTGGTCTTAAGACCATTGCTTCGTCTTCATCT 
AACTTCATTGCTGGGAACATATCTTCTTGATAATGATCCCAGTGACCAGAT 
GTTTTATATAAATCAACATTGGCTAATACTGGTGTGTAAACATGATCGTAT 

20 CCCATACTTACTTCTTTATCGACAATATAACGTTCTATTTCCCTACGTATT 
GTAGCACCATTTGGTAACCATAATGGTAAACCAGCACCAACGAGTTGATTG 
TTTGTAAACAATTCTAAATCTTTACCAATTTTACGATGATCACGCTCACGA 
CGTTCTTCCAACATTTTTAGATGTGCTTTCAAATCTTTTTTGTCAAAGAAT 
GCTGTACCATAAATTCGTTGTAACATTTTATTATCACTATTTCCACGCCAA 

25 TAAGCACCAGCTGTAGATAATAGTTTGAACTCTTTAATTTTAGAAGTAGAA 
GGTACGTGTACACCTCGACATAAATCAGTAAATTCACCTTGAGTATAAAGT 
GTTACACTCTCATCTTCAGGAATTGCATCAATAAGTTCTAATTTATAAGGG 
TCATCCTTGAAGAAGTCTTTTGCTTTTTCTTTACTAACTACTTCTCTTACA 
ATTTTATGATTTTCGTTCACAATTTGTTTCATTGTTTTCTCAATTTTATCA 

30 AAATCATCCGATGAAACCTTATCATCCATATCAAAATCATAATAGAATCCG 
CCTTCTATTACAGGTCCAACTCCAAATTTAACGTCTCCGTATAAACGTTTT 
AATGCTTGTGCCATTAAATGAGCTGTTGAATGACGAAGTACTTCTAACGCT 
TCTTCACTCCCAGGAGTAATAATTTCAATAGCTCCATCTTGTTCTAAAGGG 
CGTGTTAAATCTACAAGTTGACCATTGAATTTTCCCGCAACTGCTTTTTTT 

35 CTTAATCCTGGACTAATTGATTGAGCGATGTCTTCTGTAGTAGT 
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40 ATACTGCGATATTAACCGATGAAGAAATACGTGAAAAAACAAAATCTTTTC 
AAAAAGAGTTAGCAGAAATTGAAGATGTAAAAAAACAAAATGATTATTTAG 
ATAAAATTTTACCTGAAGCTTATGCACTTGTACGTGAGGGGTCAAAGAGAG 
TATTTAATATGATTCCTTATAAAGTACAAGTAATGGGTGGTATTGCTATAC 
ATAAAGGTGATATTGCAGAAATGAGAACAGGTGAAGGGAAAACATTGACTG 

45 CAACCATGCCGACGTATTTGAATGCTTTAGCTGGTAGAGGTGTACATGTTA 
TTACAGTCAATGAATATCTATCAAGTTCACAAAGTGAAGAAATGGCTGAAC 
TATATAACTATCTTGGCTTAACTGTAGGTTTGAACTTAAATAGTAAGTCAA 
CTGAAGAAAAACGTGAGGCTTACGCACAAGATATCACTTATAGTACGAATA 
ATGAACTTGGGTTTGATTATCTTAGAGATAATATGGTGAACTATGCTGAAG 

50 AGAGAGTAATGCGTCCTCTACATTTTGCAATTATTGATGAGGTCGATTCCA 
TATTGATCGACGAAGCAAGAACACCTTTAATTATTTCTGGTGAAGCGGAAA 
AATCTACTTCTTTATATACTCAAGCAAATGTTTTTGCAAAAATGCTTAAAG 
CGGAAGATGATTATAATTATGATGAAAAAACCAAAGCTGTACATCTTACAG 
AACAAGGTGCAGATAAAGCTGAACGTATGTTCAAAGTAGATAATCTTTATG 

55 ATGTTCAAAATGTGGAAGTGATTAGTCATATTAATACAGCTTTAAGAGCTC 
ATGTTACTTTGCAACGCGATGTTGATTACATGGTCGTTGACGGTGAAGTAT 
TAATTGTTGACCAATTTACTGGACGTACAATGCCTGGACGTCGTTTTTCTG 
AAGGTTTACACCAAGCAATTGAGGCTAAAGAAGGTGTAGCAATTCAAAATG 
AGTCTAAAACGATGGCATCCATTACTTTCCAAAACTATTTCAGAATGTATA 
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ATAAGTTAGCGGGGATGACTGGTACAGCGAAAACCGAAGAGGAAGAATTTC 
GTAATATCTATAATATGACAGTTACCCAAATTCCAACAAACAAACCTGTTC 
AACGTAAAGATAATTCAGACTTAATTTATATTAGTCAAAAAGGAAAGTTTG 
ATGCGGTAGTTGAAGATGTTGTAGAAAAACATAAAAAAGGACAACCCGTCT 
5 TACTAGGTACTG TTGCTGTTG AGACTTCTGAATATATTTC AAATTTACTAA 
AAAAACGTGGTGTCAGACATGACGTATTAAACGCTAAAAATCATGAACGCG 
AAGCTGAAATCGTTTCAAACGCGGGGCAAAAAGGTGCAGTTACAATTGCCA 
CAAATATGGCTGGACGTGGAACAGATATTAAACTTGGTGATGGTGTTGAAG 
AGTTAGGTGGACTTGCTGTTATTGGTACTGAGCGTCATGAATCAAGACGTA 

10 TTGATGATCAATTACGTGGACGTTCAGGACGCCAAGGTGATAGAGGAGATA 
GTCGTTTTTACCTATCTTTACAAGATGAATTAATGGTACGTTTTGGTTCAG 
AACGCTTACAGAAAATGATGAACCGTTTAGGAATGGATGATTCAACGCCAA 
TCGAGTCGAAAATGGTATCTCGAGCTGTAGAATCAGCTCAAAAACGAGTAG 
AAGGTAATAACTTTGACGCGCGTAAACGTATTCTAGAATACGATGAAGTTT 

1 5 TACGTAAGCAACGTGAAATTATTTATAATGAGCGTAATGAAATCATTGATA 
GTGAAGAAAGTTCTCAAGTCGTTAACGCGATGTTACGTTCTACATTGCAAC 
GTGCGATTAATCATTTTATTAATGAAGAAGACGATAATCCTGACTACACGC 
CATTTATCAATTACGTTAATGATGTGTTCTTGCTGAATTATTAATGTTAAA 
TGCTATCCATGTGCCATTACACACCGGTAAAATTGAAGAAATGACACGTGT 

20 ACTTCGTGAAAAATAAGCATAATATATTTAAGCGTCTGGATTATCATCCCC 
AGACGCTTTTAAAATGTTTAGAGAATTATCAAATTTTATTTGAATGCTTTC 
GCTAAAACAAAGATAAACATATCTTTCGAGGAGTTTTATCTAGTATACGTA 
CATTCTTCCCTCAACAAGCTAAAAACATAACTATCTGAGAATTCCCCTTGT 
AATCGTTCATTACTTCTCAATATACCTTTGCGTGAACCCAAGTCTAATTGG 

25 TATAGCTCTGCTTTTCTTATTTTTAGTAGACATTCGTATTTCTATACGATT 
AATATCATACACTTCATATACATAGCGAATTAACGCTTTAGTACATTTAGT 
CATAATACCTTTCTTTTGAAAGTCTTCAGCTAAATAATAACCAATTGAAGT 
TGTTTTATTAACTAAATCTAAGTAATGCAATCCTATGACTCCAATCAATTC 
TTTATTACTCCATATTCCACAATGAAATCCATTACCATCGATAAATTGTTG 

30 CAACGCCGAATGGATAAAGTGTTTACTATCTTCAACTTTCTTCGTATGTTC 
AACAAAAGGCAGAAATTCAGCTAAATAGTCACGATTGCTATCTACTAATTT 
AAATAACTGTTCGGCTTCTCGTTCTTCTAATATTTTTAATTTAATATTTTT 
ATTTATTTGATAACTAAACATACAAAATCATCCTTTCTTAGAATTTTCTGT 
CATTTTAGTATACGCGAGTATAGGATTATGTCATAAAACTTGATTACTAGA 

35 ATTCGATATTCTTTAAGTTACAAGTGTGTATGTAGTATTTCTCTTGCTTCT 
ATTGTCTTATACTTACAACCAATATTAGAGTGAAAAAGCTTAACTTAAGAT 
AGATAATCATCAACAACACAAACGTTAAAAACACTGAGAAAACTTTTTTAA 
AATATACTTCTCAATATTGATTAGTAAAATCCAAAATATGAATAATAAGTA 
GACATTAAACAATCTTTATTTTATTCTATAAATAATATAAAATAAATTATT 

40 AGATTAATACATTATTTTTTTATTTCTAATTTAAGTTGAAATCGTATGTAA 
TCGGAGGAAACCATAATGATATATTCTATCGGTAAGAATTTAGGTAATAAA 
TTAACAGGTATAGAACAAGCTATGATCAATAGATTAAAGCTATTTAAAGAT 
AATTTAGTCCCAAATAAACTCATATTCACATCTTGGTCACCACGTTTATAT 
ATGCATGCACATTCGTTAAACATCGATTCAAAAGATATTTTCAGTCTTTAC 

45 GATTTTCTACAAGATAGTATTAACTTTGAGAAAAAACATATTGATTGGATA 
AATTATTGGCAAAATATATGTAATTATACCTTAAAATTCGTTGAAAATACG 
AATGATATTAAAATATACGATAACGACACATATAAAATGTATGTGCATTTT 
GTTGATTCAAATTATCAAACTTTAGACTATATTAACCATTTTGATATACAA 
CAACGTAAAATTCGAAGAGATTTTTACGATACAAGAGGCTTTTTAAGTTGT 

50 AGTAGAATTTTAACCTCTCAACAAAAAGTCGTGATGGAACAATTTTTTACA 
CCTACACAAAAAGTTAAATTTCAAAAATATTACAACCCTGAGCACGAACAT 
CCTACGGTACAATCTATCATTT 



55 Sequence 3418 

step. 1002h01 .cons .ok 

CTTGAAATATCATTTGAATTTTCTTATTAAACTTCAATCGCTCTTTACGTT 
TTCTAATTTGATGAATATCTTTACCATCATACTCGATTGCGCCATCAGTAA 
TATCATTTAATTTGATAATCGCTTTTCCAGTCGTTGATTTACCACATCCAG 
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ACTCTCCTACTAGGCCTAAAGTTTCCCCTTTATAAACATCAAAACTTATAT 
TATCAATCGCTTTAACTTCATTATGTTTACCTTGATTAAAATACTGTTTTA 
AATTTTTAACTTGTAATAATACATTTTCATTACTCATTAAAAGCCACCCTC 
TCTACTCTATGTGGTTGTTCATAATTACTTGGCATTGTTCTTAATCGTTTT 
5 TGAACCATTGAAGGTGGCGTAACTTTTGGAGCTCTTGCATCTAATAACCAA 
GATTTAACAAAATGTGTGGGTGAAATTTTGAACCAAGGTGGTTCTTCTTTA 
AAATCAATATCTAAAGCATATCGACTTCTACGTGCGAAAGCATCACCAATT 
QGTGGATGAAGTAAATCTGGTGGTGTACCTGGAATTGCAATTAAGTCCGTG 
TCATTACTGGTTGTTAAATCAGGCATTGAAGAAAGCAATCCCCAGGTATAG 

1 0 GGATGTTTAGGATCATAAAATATTTCATTCACATCCCCTGTTTCAACC ATC 
TGTCCCCCATACATTACGGCTACTTTGTCCGCAATATTTGCTACAACGCCT 
AAATCATGCGTAATAAAGATAATTGAAGTTTCAATCTTATTTTGTAGTTCT 
TTCATAAGATCTAAAATTTGAGCCTGCATTGTCACATCTAAAGCAGTTGTA 
GGCTCATCAGCAATTAATATTTTAGGCTCACATGCCAATGCTATTGCAATA 

1 5 ACTATTCTCTGACGTTGTCCTCCTGAAAATTGATGTGGATAAGCTTTAAAT 
CGTTTTTCAGCACGAGGTAACCCAACTAAATTCAAAATTTCCAATGCTCTT 
TGCTTGGCCTTTGCTTTACTTAATTTCTTATGTTTAATCAAAGGTTCCATG 
ACTTGCTTTCCGATTTGCATTGTTGGATTTAAAGAAGTCATAGGATCCTGA 
AATATCATTGAAATATCTCGACCTCTTAGCTGTATCAGTTCTTTTTCACTT 

20 TTCTGAGCTAAGTCCTCACCTAAAAATAAGATTTCTCCCTTTTTTATTCTT 
CCTGTATCCTTTTGAAAAAGTTTTGTAATTGCCTTAGTTGTTACAGATTTT 
CCAGATCCAGATTCTCCAACAATGGCTAACGTTTCCCCTTTATTTAAATGA 
AAATCCACGCCTCTGACAGCTTGCACTTCTCCTGCAGCAATATCAAAGGAA 
ACGTGCAAATCATTTACTTCTAATACCGTTTCTGTCATTACTTTTACCTCC 

25 TTTATTTTCGCATTTTTGGGACGTTTAAAACGTTTAGCTTTCTTTCTAATA 
CTTTCGTTTTTACTTTGTTTAGCAGCGTCAAAGATTAAGCTGTATAAACCA 
GATTCGTTGATGATTACCATGTTTCGTTTTTGACCTGATGCACTAACTTGG 
TGCGTCAGCTTATCTTCATCATCAACATGATTTCTAATTGCATTATCAGAT 
CTTGTGTAACCTAATATTTCTGCCACGTCTTTACCTAAAAAATATGGTTCT 

30 CCATCTACTTCTATTTTTCTTACTGGTAAATCTTCAAAATTAAATACTTGT 
AAATCTTGCATATTGTTTATGCTCCTTTCTGCTATACTCCTTATAAAAGGA 
GGTGAAAATATGAAAAGTTTTATTATTGCGTATGATTTAAATAACCAAAAG 
GATTATCCAAAATTAATAGAGCGTATTGAGGATTATCCTAATGTTGCTAAA 
ATCAATAAATCAGTTTGGTTTATTAATTCAACTAATGATGCTAAAACTATT 

35 AGAAACGAATTAAAAATGTTTATTGATAGCGATGATAGTTTGTTCGTTGGT 
AAGCTGACTGGTGAAGCCGCATGGTCTAATGTAATTTGCAGTTCACAACAT 
TTAAAAGATTATCTTTAGTATCGTTAATCTTGAGCCTCTCTTTTTGAGGGG 
CTTATTATAATTTCGTCAAATTCTTCTGCATTTTTTATATTAGCTGCGACT 
TCTTCTAAACTTTCAAAAACTAATGTATATCCACTATTCTTTGTTTCTTTT 

40 ACAAAAACCCTGTATAATACCTTGTTTTCAGTCATTTGAATTCCTCCTTTA 
AGTTAAAACTTTCTTTTTGCGTAAGTCTTTGTTAAAAAAAATATCTCTTCC 
TTCTTGAGGTGTTAAATCTAACGCGAAATAAATTCCATTTATTACTGGATA 
TGAAGGTTTAGTTCTTCCGTGTATCATATTGGACAATGTATCTCTATTAAC 
ACCTATTTCTTCAGAAAGAGTTTTGATGTTGTGTCCTTTTAAAGCCATTTT 

45 TGATTTTAAAAGTTTAGTGTCGATAGGCATTTCATTTCACCACCTTTCGTA 
TTACGTAAGTAATCTTATCATGGCTCTACAAAATAGGTCAAGCATTTTACG 
AAAGTTTTTAAGAAAAAATATTGCAAATCACGAAAGTTTTCCTTATAATAT 
AGTTATAAAGTAAAAGGAGCTGTATTACGATGTGCTTTTCAAAAAGAATGA 
AACAATCAAGAGAAAAACAAGGTATGACTTTAGCTGAACTAGGAAGAAAAA 

50 TCGGTAAAACTGAAGCTACTGTACAACGTTATGAAAGCGGGAATATTAAAA 
ATCTTAAAAATGATACTATTGAAAGTATAGCTACTGCATTAAATGTTAACC 
CTGCTTTCTTGATGGGTTGGATAGAAGAAGTTGAGGAACAACCACAACATC 
GTGCAGCGCATCTTGATGGTGATTTAACTGACGAAGAATGGCAAGAAATTC 
TTGATTACGCTGAATACATAAGAAGTAAAAGAAAATAAAGGGTGTTTTATG 

55 TGGGGAAATATGAAGATATGTTAATTGAACATGACTATATTGAAGTCATTG 
AATGTGATAACTTACCTAAAAGGTTATCTGGTTTGTGGCTTGGAGATATGA 
TTTTAATTAATCGTAACTTGCCTATTACTTCCAAACTTGAAACACTTGCAG 
AGGAACTCGCTCATAACGAACTTACATATGGAAATATAGTTGATCAAAGTA 
GTTTTAATCATAGAAAATTTGAAGGTTATGCACGTAGGTTAGCCTATGAAA 
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AGTTAATCCCTCTTAAAGATATTGTAAAAGCATTTTTGCAAGGCATTCATG 
ACTTGTATGAACTTGCTAATTTTTTTGA 



5 Sequence 3419 

step . 1002h02 . cons . ok 

AAATAAACGAGAAAACCATAATCATGATAATCAACCATACAAATAGTACAA 
ATGTATTAAAGGTTGGATACTTATATCATATGTTTTAATGTTTGAAACTTT 
TATTATAAATAAATGAATTGAAAAATGCTTGTTGCTGACTGTTTCGAATCA 

1 0 TAGTCAGTTGTAAGGGAGCGAG ACGACGAATCTG AAAAG AATCGGTCTCGC 
TTTTTTAGTATAAAAATTTAATTATTATTTTAAAATGATAAAGTGATTTCT 
ATAGTTAAATTTAGTGCGAG«ATACTTT7UVA7^TTATATTGAAAAAATGT 
CGCTATTACAAAATGTTCTAAAGTGCACAATTCAATTTAAAAACAAAAAAG 
AAGTTTTAAAATTTATATTTCTGTCTATATATGAGTTGTGCACGAAATGAC 

1 5 AAACGCAATTC AC ATATGAAATATTTATCACTTTACAAT AATATGTTGTTA 
AGTTGATGTAGATAAATTCATTGTTTTACGTATAAGTCACTATCATTGTTT 
TTTAGGATTCATTTTAATTATATCCAATCGACTAAAGTGAAAAGGAGGAAC 
ATATTAAATGAATTGGATATCGATTATTTTATTTATTATGGTCGTATGTGG 
CATTTCTTTTTATGCCTATTTGCAATCAAGAAAAATTAAAACGAATAGTTC 

20 AGATGGTTATTTTATGGGAGGAAATAGCCTTACTGGATTTACAGTTGCCTC 
TACAATTATCATGACCAATTTGTCGACAGAACAAATTGTTGGGCAAAATGG 
TCAAAGCTATGCACAAGGAATGGAAGTAATGGCTTGGGAGGTTACGGCTGC 
CGTAGCAGTTGTATTGTTGGCCTGGGTCTTTCTTCCTAAATATCTAAAATA 
CGGTGTTAATACAATTTCTGAGTTTTTGGAATTACGTTATGATACATTTAC 

25 TAAACGTTTTGTCTCCATTTTATTTATCTTTACCTATGTCGTATCTTTCTT 
GCCTGTAGTATTGTATTCTGGTTCCCTAGTGTTCAATAAAATGTTTAAAGT 
TGATGAATACTTAGGTGTAAGTAGTTCAACTGCTGTCATTATTATTTCATC 
TATTATTGGTATAATTGGCATTATTTACTTATTTATAGGTGGTTTATCGTT 
AAGTGCTTTTAGTGATTCGATTTATGGCATGGCTTTAATTATAGCTGGACT 

30 TGCGATTACAATATTAGGTCTAGGTCAATTAGGAGATGGCAACTTCCTACA 
TGGTTTCGACAAAATCGTGCAAGACACGCCTGAGAAATTGAATGGTTTTGG 
TAAGGTGGACTCGGATGTTGTACCTTGGCCAACCCTATTCTTCGGTATGTT 
CTTTAACAATTTATTCTTCTGGTGCGCAAACCAGATGATAGTTCAAAAAGC 
ACTCGCAGCTAAAAATTTAAAAGAATCTCAAAAAGGTGCAATATATTTAAG 

35 CTTATTTAAAGTGTTCGGACCTTTATTTACAGTCTTACCAGGCGTAGTAGC 
ATTTAACTATTTTAATGGTAGTCTTGATAAATCAGATAACGCTTACCCTGC 
ACTTGTAACTTCAGTATTACCAGAATGGGCATTCGGCTTATTTGGTGCGGT 
TATTTTTGGTGCAATATTGAGCTCATTTGTTGGCTCATTGAATAGTACAAC 
TACACTATTAACACTCGATTTCTATAAACCTATTTTTGGAAAAAATAAATC 

40 AGATAAACATATTGCTCGAGTAGGCCATATTGCTACTGTAGTCATTGGAGT 
TATTGTTGTAGCACTTGCACCAGTCATCTCATTATTCCCTAGTGGTCTTTA 
TGCAGTAGTTCAACAGTTTAATGGTGTGTATAGTATGCCAGTGCTAGCTTT 
AATTTTAATCGCTTTCTTTTCTAAACGCACATCTAAATTGGGCGCTAAAGT 
GACACTCTTCACACATATAATTTTATACGCTATAATCAGCTTTGTATTTAC 

45 AGAAATTAATTATCTATACACATTTAGTGTATTATTCTTTGTAGATTTAAT 
TATTATTTTGATTTTTAACAAAGTTAAACTATCTAGTGAGTTTGATTTAAG 
CACGCACCAACCGAAAGTAGATATGACGCCATGGAAATATCGTTACGTTGC 
AGGTATTATTGTTCTTGCTTTAGTAGTAGTAAGTTATATTATCTTCTCACC 
ACTCGTGTTGGCAAAATAAATGTTACAATTAAAACTTAATGAGTAAGAAAA 

50 TAAAAGTGTACGTTAAGGACGTTTATCAGCGTGATCACGTGCACTTTAACT 
ATGAATAAAGTTGATTTAAAAATTAGAAAGGAGGGACAACGACGATGCAAA 
CAGTCGGAATTATACCTTCGCCAGGTATAGCACATCAACATGCAAAAAAAA 
TAATTCCAAATGTTAAACAGTTATTGTCAAAGCGTACTAAACATAGTCAAT 
GGAATTTCGACATCAAAGTCGATCTCATGATAGGATCTGCAGAGGATGTAC 

55 ATGAAAGTGTTGAAAAAGCAGCACAAATTAAAGAGGAACATCAGTGGGATT 
ACGTTGTTTGTCTGACAGATTTGCCTAGTATTTCAGATAATAAAGTGGTTG 
TCAGCGACTTTAATAGTGACAAACATGTTGCAATGCTATCATTACCGTCAC 
TAGGTTTTATTGATTTGAAGCGCAAGCTAGTTAAAACGATGACTTCATTGA 
TTGAACAATTATATTATAATCAACCGAAAGACAAAAATGCGCCACATCCTT 
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TTGTACGCGTGAAGGCTGTAGAACCTGACGAAGACGCCACATCAAAACAAC 
GATATATTAATATTTTATTTATCATAAGTTGGATTCAGTTAATTGGTGGAC 
TGACACGAGCAAATCAGCCTTGGAAAAACATCTTTAATTTTAAGAAAATCA 
TTTCAGTTGCCTTTGCAACAGGAACTTATGTCTCAATATTTTCAATGCCAT 
5 GGGAATTAAGCGTGATTTATTCACCGCTTCGACTTATCATATTGATGGTGA 
TTGCTATACTTGGGATGGCTGGATGGCTATTCTATGCGCATCAATTGATTG 
AAAAGAAAACTGCTAAATCTCAGCGTGTATATCGATATATTTATAATTCAA 
CCACACTTGTTACACTAAGTTTGATTACACTCATAAATTATGTCATTTTAT 
ATTTATTGTTAATCATCAGTATTACACTCTTTGTCCCTGTGGAATTATTTA 

10 ATAGTTGGACGAGTGCCCAATCACAATTTACGTTCTCAAATTATATGAGAT 
TGATTTGGTTTGTATCATCATTAGGACTTTTAGCTGGAGCTATGGGATCAA 
CTGTTGAAAATGAAGAGAAAATACGTCGTATTACTTATTCTTATAGACAAT 
ATCATCGTTATAAAGAAGCTGGCAAGAACAAAAAGAACAAGAAACTTCTCG 
TGATGTATCACAACAAAATGTCGAACAACAAACTTCAAGTAAAGATGAAAA 

1 5 TAATGAAC AATATGAAGGTAAAAAAC AAGGACATAGAGAGG AGGATGACGC 
ATGACAAATCAAAAAACTGTGGGTCTAGTCGTCGCTCCAGGTGTTACTGAA 
CGCCTTGCAGAAAATCTCATACAAGAAATGCCTAAAATGTTATCTACGCAT 
TATGATCATCAGCAAGAATGGATTTTTGATTTA 

20 

Sequence 3420 

step . 1002h05 . cons . ok 

CCAACGTGAGATAATCAGCACCTTTTTCATTAATAATAGCTGCGGCCGATT 
CTAATATTAATTGTCTTTTGGAAGTTGTAGGCATTTCGTCACCTTCTTTAT 

25 CGAATTCTAAATAATATATAATTGTAATTATTTAACACTTTTTCAACATCC 
ATACTTATCGAGTGTATTTATTTTATAAACAACTTTTCTTAAATTCAACTT 
TATTAACATGAATTGTATTTATTAATAACGATTTACTTATTACTATATGTC 
ACAAACATATAAAAAGTAAACTTATCATATAATATATGTTCCTCTGTCAAC 
GTCGGTTAATTATGAAAAAATATTAAGTAGCTCCATTCCGTTGCTTAATAC 

30 TTTTTTATTCACTTATAAAAATCCAAGAAAAGAGAGAGCGATTTCGCCCTC 
TCTTTTTTGTGTAGTAGTACTATTAAATTTTATCTCACTAACAATTAATAT 
ACTTATTAATAAAATGATGTATTTAAAAATTATAATAATCTTGAAATATAA 
TTATTAAGA/^TTCGCAGTTATATCATTTATTAACTCAGCCTTATTGTACG 
TGGGACGATTTTTTAGCAGTCATAAAGTATACAATAATACCAATAATGAAA 

35 CCAATAAGTGCTGGAACAATCCATCCCATACCAATGTTAAAGAATGGGAGA 
TAATTTTCACCAAATCCAATGATTGTTTGTGCAAATTTTGTATTAACAAAG 
AACTCTGGACTTGCTTTCACACCATCTACAAATGCCGCCACCATTGTAAAT 
AATGTTGTAAATCGATAGACAATTTTAGAATGTTTAAATAATGGACTAAGT 
AATGCTAATAAAATTAAAGTAATCGCTAAAGGATAAATGAACATCAACACT 

40 GGTGTTGAATACATAATAATTTTTGTTAAACCTACATTAGCAAATATACAA 
GCTAATATACTCACCCCAGTAGCTAACCAAAGATAGTTAGATTTAGGGAAT 
AACTCTGTAAATGTTTCCGAAAAGGCTGTGATCAATCCTATTGCTGTTTTC 
AGACATGCCACAATGATGATTAGTGACAAAATAATAATTCCGTAATCCCCT 
AAATAATGTTGTGCAATCTGAGCAAGCGCAATACCACCATTTTCACTTACT 

45 TTAAAACGACCTAAACTCATCGTACCCATTAAAGCTAATAAAGTATAAATA 
ACGCCCATAGCTATAATACTAATCGTACCTGATTTTAAAGTTTCTTTAGCG 
ATTGTATTCGGATTAGTAATCCCCAACTTTTTAATTGTAGTAACAATGATA 
ATACCAAATGCTAATGATGCCAAAGCGTCTAATGTATTATATCCATCGATA 
AACCCTTTGAGTAACACGCTATTGCTATAATCAGCACTTACTGGCGCATGA 

50 CTAATTCCACCCATAGGACGGATAAATGCAAGCACAACAACAATTCCAAGC 
AAGATGAGAAAGACCGGATTTAAGAATTTTCCAATATAGTCTAATATTTTA 
GAAGGTTTACGCGAAAATAACCACGCTACTCCGAAGAATAAAATACTAAAA 
ATAGCAACAACGCTTGGGCCGTACCAGATGAAATAAATGGTGAAAATGCTA 
TTTCAAATGACGTCGTCGCAAGTCTTGGCAACGCAAAAAACGGACCTATAA 

55 CAAGATACAAGCCAATTGTGAACAAATAACCATATATTTTACTTATCCTTG 
AGGAAATTTCAAAGACCCCGTTTGTTTTAGATACACCTATCGCTATAATTC 
CTAAAAAAGGTAGTCCGATAGCCGTGATAAGAAATCCTAAATTGGCGGTCC 
ATACATTTGCCCCCGCAGTTTGACCTAAGTGAATTGGAAAAATGAGATTTC 
CAGCACCAAATUW^GACCAAACAGCATTGAGCCGATAAATAGATTCTCTT 



935 



wo 01/34809 



PCT/USOO/30782 



TTAATGTTAATTTATTTTTCATCATTAATTATGTATCTCCTAATATGTATT 
ATTAGCTATTATACATGATTTTAACCGTCTGCGTGGCACGAAGTTATATTT 

ATATATGTGTGTATTATTACATCTTAATCTCATAGCTTTTCATTTTATAAC 
ACGATTCAAACTATAAATAATGAAAAATTTTTAATATTTCTATACATTGTT 
5 GTTTTTTATAAG AAATG TTAAGTTAATATTTATTTTTT ATTAAAAATCC AT 
TTATTTTAGCATATAATTTTTAATTAATAAAAGTTTTTGATAATATTAAGT 
GTACAAGGATTTTTATTTTTTATTAAAAAGTGAAAATCATTACTAAAAATG 
AGGAGGACTGTGAAATGACTGTATTGAGTGAGCAAGACAAAATAAGATTAT 
TAGCTGATATCGTCAAGATTCAAACTGAGAATGATCATGAAATTAAAGTAT 

1 0 GTG AGTACTT AAAAGACCTGTTT^GTC AATATGATATTGATTCT AAGATTG 
TTAAAGTTAATGATTCACGCGCTAATTTAGTTGCTGAAATTGGTAGTGGCG 
CTCCAGTCTTAGCAATCAGTGGACACATGGATGTTGTAGATGCTGGTGACC 
ATGATGATTGGACTTTTCCTCCATTTGAACTCACAGATAAAGATGGCAAGT 
TATTCGGCAGAGGTACCACTGACATGAAAGGTGGTCTTATGGCGATGGTCA 

1 5 TCGCGATGATTGAATTAAAAC AATCAAACGCATTAAAGC AAGGTACGATAC 
GTTTATTAGCTACTACAGGAGAAGAGACGGAACAATATGGTGCACAATTAC 
TAGCTGATGAAGGTTACTTAGATGATGTGAGTGGTTTAATTATTGGCGAAC 
CAACTAGTAATATCGCTTACTATGCTCATAAAGGTTCTATGAGCTGTGTAG 
TCACAGCGAAAGGTAAGGCTGCACATAGTTCCATGCCACACCTAGGTACAA 

20 ATGCTGTTGATATTTTAGTTGATTTTGTAAATGAAATGAAACAAGAATATA 
AAAATATTAAAGAACATGATAAAGTACACGAGTTAGACGCTGTTCCAATGA 
TTGAGAAACATCTCCACAGAAAAATTGGTGAAGAAGAATCACATATCTACT 
CTGGATTTGTAATGTTAAACTCTGTATTCAATGGTGGTAAACAAGTTAATT 
CTGTTCCTCATAAAGCGACAGCTAAATATAATGTAAGAACTGTTCCAGAAT 

25 ATGACAGTACTTTCGTGAAGGATTTATTTGAAAAAGTCATTCGTCATGTGG 
GCGAAGATTATTTAACTGTAGATATACCTAGCAGTCACGATCCAGTGGCAA 
GTGATCGTTGGAGATTTAATTAAACATTTTAAAAAATAGGAGTGCATTTAC 
ATGTTTAAAAAGTTATTTGGAAAAGCTAAAGAAGTTGACAAAAACATTAAA 
ATCTACGCACCTTTGACAGGTGAGTATGTCAAAATTGAAGATATTCCTGAT 

30 CCAGTGTTTGCGCAAAAAATGATGGGAGAAGGCTTTGGTATTAATCCTACA 
GAGGGTGAAGTTGTTTCACCGATTGAAGGCAAAGTCGATAATGTTTTTCCA 
ACGAAACATGCCGTAGGGTTAAAAGCCGAAAATGGATTAGAGTTATTAGTT 
CATATCGGATTAGATACCGTTCAATTGGATGGCAAAGGGTTTGAAGTACTT 
GTTGAAAGTGGAGATGACATTAAAATAGGTGACCCACTTATACGTTTTGAC 

35 TTAGAATATATTAATAATAATGCAAAATCTATTATTTCTCCTATTATTATT 
ACAAACTCTGATCAAACTGAATCAATTCATATTGAAGACGTACAAGCAGTA 
GTTAAAGGGGAAACACAAGTTATTGATGTGACAGTAAGCTAATGGTTAAGA 
ATTATTCATTTTATCAATTTATTATGACCGTGCGC6GAAGAAAAGACGATA 
AAGGGGTTTTTGCTGAGCAAATTTTTGAAGACCTTGCCTTTCCAAAACACG 

40 AAGATGATTTTAATACATTATCTGAATATATTGAGACACATAGCGAATTCA 
CCCTA 



Sequence 3421 

45 step. 1002h06. cons. ok 

ATACTTTCCTAAACTTCTAAATTGATTAGCATAAGCATCCAAATCGTCTTG 
ATGTTTATTTAGTTCAATAATCTTGTCTGCAAATTGATTAATTTTAGGCAA 
TGAATTATTAGCTTGATATACAGCACTCTTAATTTTATTGATTGTAGGTAC 
ATTATCTTCAATTGATAGTCCGACTTTATTAGCTTCAGAAAGTAAAGCGGT 

50 TGCTACGGTTTTGTTAAATTGTTTATTTGCTTTATCAATCACAAACGACGA 
TCCCGTATCGGTTAACTTAGCTGCTACAGCATTAATCTTCTGATTTACTTT 
AAAATCTATATCCGCCTTTTGAGGATGTTTTCTTAAAGTACCAGTGATTTC 
ATGTGTGAATTTCTTCGGTATATAAATACCTGCATTU^TATTTTCCCATCTT 
AATTTCATGGTCGGCTTTTTCTCTACTCACAAATTGCCAGTCAAAACTATC 

55 ATTCrTTTTTAAAGTTTTGACCATTTTATTTCCTACATTAATATTCTTACC 
ACGAACTTTTTCACCTTGGTCTTCATTCACTACGGCAACTTTGATATGTCC 
TGTATGACCATACGGATCCCACATAGCCCAGAGGTTAAACCATGCATAGAA 
TGAAGGAAGAATTGCTAAACCAGCTAAGATGACCCATACACCTGGTGTTTT 
AGCAACTCTTTTTAAATCCGTGATAAAAAGTTTTAGTGCGTTTTTCATTTT 
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CACACTCCTAATTATCATTATTCTTTTTTATTGTAATACATTTCCGTACAG 
ATGATTACTTAAATGTGCATATTTTTTTGCTTACGTAAAGATGATACACAT 
ATTATATTAAAATTTGATTAAATGGATTTATTTTTCTAATGTTGATTAGTA 
TAATATTTGTATTAAGGCGTAATTCATTTTTGTTGATGCTTGTCAACTTTA 
5 TAAACCATAAGATAAATTATATCAAATTCTCATTTGTTATGACAATAACTA 
AAAATAATTTTTAAGGGAGCACACTGTAAAAGCCTTTAGACCTTCCTTTCT 
TAATGGCGTTTCAACATATCTAAATGATAATTCGGCTAATAATACAGTAAT 
TACGATATCCATAATATAAACATAAGTTGGTAATTGACCATCAATAAAATA 
ACTATGAATAAAGCTAATTACAGGAAAATGCCATAAGTATAAACTGTATGA 

1 0 ACGCTTTCC AATGTAG ACAAATAAAGGATTTCCTAATAACTTAGCTAAAAT 
TGTCGTTGGATGAACAACACTTGCAATAATTAGCAAAGTCATTGTTGAAAT 
AAGATAAAATCCACCGTTATAAATCCAATCACTTTCATCACTAACAGTAAA 
GAATAATAGAATTAAAAATGTAAGTCCTATGATACCCGCACTATTAATCAC 
AGTTTTTAATCCTTTAGGTGGATTGGGATTTAATTTAAAAGGTGGCCAGAT 

1 5 AAATGCTAGAAGTACACCTAAAAGCAGTGTCTGCAATCTTGTATCAGTTCC 
AAAATATACTCTAGAATGGTTCAAGTGAGGTTGAGAAATAACAACCATCAT 
TAATAATGAAACCAGGGATATGATCCAAAACATCAGTATGACATTTTTCTT 
TTTCTTAACTATTGCCATAAATAATAAGAGTACTGCTGGGAAAAAAAGGTA 
AAACTGCTCTTCAATGGCTAGTGACCATAGGTGCTTTAAAGGCATAAAAGA 

20 AAATTGCTCGAAATAATTGACATCTTTAGCAATATACCACCAATTAGATAC 
GTAAAATATTGCTGCTATCATATCATGTTTAACTCTTACAATATGCTCGGG 
GTGCAATAATAAAGTTGCAATTCCAACTACTACTATTAATGCAAATACCGC 
TGGTAATAACCTTTTAATACGACGAlATCCAAAAATTTTTAAGATTTATTGT 
TCCAGTATCTTCATACTCTTTAAGTAATAAGCTCGTAATCAAATAACCTGA 

25 AATAACAAAAAAAGTATCTACGCCTAAAAAACCACCTGTTAACCATTGTTT 
ATTCAAGTGATAAATAATAATACCAATGACTGCAATTGCTCGCAAACCATC 
TAATCCAGGCATATATCGTATTTTTTTAGCAGAGCGATATTCATTATTTAA 
TTTTTTTCTCATTGTTACATTCTTCCCTATTAATAATAAATTTCGTTAGAC 
AAGTAACTTTTTTACTAATCAATTTCATAAGACGAACGCTATTGCTTCTAA 

30 TCAACTTATTCAAACAAATTTTATCTCATTCAAATTAACAAGGTCATTCTC 
TTATGTTATAGCTATCCCTATCATATAGCGCACGTTTCCATTGAGTTAATT 
ATCATACTTCGTAAGATATACCTCATTAAAAGGTTATAAGTCTTAAGTATT 
GTAATACATTTGTAATATTAATTCAATTTTATTATAGGCCTAATTTATAAA 
AATATTAAATGAACATACGATAATTCTTATTGAATTTTCTTTAATAATTAA 

35 AAAGGATAGCTACATCATTATTCCAAACAATGAGTAGCTATCCTTTATATT 
ATCCGATGCCTAAAGCTATTTTGGCATATCGTGACATCATATCTTTATTCC 
AAGGTGGACTCCATACAATCATCACTTCTGTATCAGAAATTTCAGGAATCT 
CAGCCAAAACACTCTTAACTTGTTCAATAATTTGTGGTCCTAATGGACATC 
CCATCGAAGTCAATGTCATTTCAACTGTACATAAACCTTCATCATCAACAT 

40 CAACTTTATATACTAAACCTAAATTAACGATATCTATCCCTAACTCAGGAT 
CTATTACCATTTCAAGAGCGCCTAAGATACTATCTTTTAGTGCTTCTTCCA 
TCAGTATCACCTCTTTAAAATTTTCTTTACACCAATATATCAAATATCCGA 
CAAAACGCCAATAAAATGCTATGATACTGTTACATAATAATTAACAACACA 
AGGAGTAATATATGCAACCTTATTTAATTTGTCTAGATCTAGATGGTACAT 

45 TATTAAATGACAATAAAGAAATCTCACCTTACACTAAACAAGTATTAACCG 
AATTACAACAATGTGGACACTACGTTATGATTGCTACTGGAAGACCCTATC 
GCGCAAGCCAGATGTATTATCATGAACTAAATATGAGCACACCTGTTGTTA 
ACTTTAATGGAGCATTTGTACATCATCCAAAAGCAAACGATTTTAAAGTGA 
TACATGAAGTACTTGATGTGGAAATTTCTAAAAATATTATTACAGCACTTC 

50 AACAATCTCATATTACAAATATCATTGCTGAAGTAAAAGACTATGTCTTTA 
TAAATAGT 



Sequence 3422 
55 step . 1002h07 . cons . ok 

ACATATCGCACGTGCAATAGCTACACGCTGTGCTTCTCCACCTGAAAGCAT 
ATGTGGATAACAATACAAACAATCAGATTTTAACCCTACTTCACGTAGTAA 
CGCTGTTACATACTCTTCCATATTCTCCTTTGATGTACATCGACACTGATT 
CATTACTTCAAACAAAATATCTTTGACAGTATGAAAAGGATGCAAAGAAGA 
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AGTATAGTCTTGAAAAACCGCAGCAATTCGATGACGTCTCACTTTTTTTAA 
ATGAACAGGCTGACCGTCCAACGTCACTTGTCCTTGATCTGGTTTTTCTAG 
TCCTAAAATCATACGACTCAACGTAGACTTTCCACTTCCACTTTCTCCAAT 
AATTGCTATTGATGCACCAGTGGGGCAATCAAATGACACGCCTTTAACTAC 
5 AGGAATACGTTTATCACGAAATATTGATTTTTTTCCATAAAATTTAATAAC 
ATTTTGAACCTTAAGCATTGi\AATCACCCCTCAGCACTAACTGAAAATAAT 
CATGAATTGTCTTTTTAGTAGATAATAAATAAGCTGTATAGTGATGACGAG 
GTTGTTGTAATACCATATTTGTTCGACCTTCTTCTATCACGCGACCTTCTT 
TCATAACTACAACACGATCTGCAATTTTATTAATCACTGTGAGGTCATGAG 

10 ATATAAAAATCATGGCGCAATCAAAATGTTGTTTTATATCTTTAAAAGCAT 
TGATGACTTCATATTGTGTAATGGTATCTAATGCTGTGGTAGGCTCATCAG 
CAATAATCAATTTAGGTTTCAATGCCAATGCTAGTGCAATCATCATACGTT 
GAAGCATCCCACCTGACAACATGTAGGGATATGCTTTTAATAACGAGCGGG 
GATCTTTCAACTTCATATACGTCATATAATGAATGAGTTCTTGTTTTATTT 

1 5 GTTGCTTTGATAATGTCGTATGTGCACGCAAAGTTTCAATCATTTGCTTCC 
CCACTTTAGTTGAAGGATCAAATGCTCGTGTTCCTTGTTGCATAATCATAG 
CAATATCTTTACCACGAATTTTGCGTAGTTGCTTTTCAGTTATTGTATTTA 
AATTTTGATGTTCAAAAAAGATATCTCCGCTTACAGATAAACGTTGAGCAT 
TTAAACCTATTAATGCTTTACAAGTAATCGATTTTCCGCTTCCACTTTCTC 

20 CAATAATACCTAATGTTTCACCACATTGAACTGTGAAATTCACGTTATCAA 
CGAGTGTCTCATCTGACCATGTGTCCTTAATTCTTAAATGCGCTACTTTAA 
GCAATGTCATGTGCTTTCACACCTTTCTTCAATGCCAGTCGTTTCTCTTTA 
GAGGACATACGAGGATCAATCGCCATTTGTAAAGCATCTGATAAAAAGTTA 
AACGCCATTACAATTATGACGATAGCCACACCTGTTGTCATCATCATTCCA 

25 GGATGTGTGAACATTACTTTTCGTGCTTCATTAAGCATCATCCCCCATTCG 
GCTGTAGGTGCCTTAACACCTAATCCAAGGAACGAGAATCCTGACATTTGT 
AATATCATTGAGCACATCGAACTACTAGCAATAATCGCTATGTCAGTAAAG 
GTTAGTGGCAAAATATGTTTGCGAATGATTGTTAAATCATTCATACCAATT 
ACTTTGGCAAATTTTACATGATCAGCTTCAATATATTGCATTACACTGGTT 

30 CGAATCACGCGACAAAACCACGCCCATCGAGTCAATATAAACGCAATAATA 
ATATTTTCTACACCCATGCCAAACAACGTAATCAATGCCAATGTCACCACA 
TAGCTTGGAAAAGCTAACATCACATCGCATATACGCATAATTATTGCATCG 
ATATAACCTGGGAAATAACCTGAAATAAACCCAAGTATCGCTCCTATCACA 
ACGGAAATAATCAATGCGACAAATACATATAACAAACTAGGTCTTATGGCG 

35 TATATTATCCGTGTTAATACGTCCCGGCCTAAATGGTCTGTCCCCAACCAG 
TGAGACCAACTTATACCAGCAAATTTATTTGCTGTATCAATGTGATTCGCT 
TCATAAAATGTAATTAAAGGAGCTAAAACTCCAAGCACTACGTAGATTGTA 
ATAATAGCTATAGCAACTATGGCACCTCTATCTTGAAATAGACGACGTAAC 
ACAATCATTTCATTGCCTCCCTTAAACGCGGATTTAATATTGCATTAATGA 

40 TATCTGCAAGTGTATTAAAAAGAATAAACAACACCGCGACAATGAGCACGT 
ATGCTTGAATTACTGGAAAATCATGTTCAGTAATCGCTTTAACACTTAATG 
GTCCTAAACCTGGCCACGCAAATACATTTTCAATAACGACTAAACCTCCCA 
TGATCATTGGTATCGACATACAAAAAATAGATACGGCTACCTGTAATGCAT 
TACGTAATACA 

45 

Sequence 3423 

step . 1002h08 . cons . ok 

CTGGATGAACTTCAGCCATAACTCCATCAGCACCAACGGCTAAAGCCGCTT 
50 TCGCTGTCGGTAACATAATATCTTTTCTACCTGTACTATGCGTTACATCAA 
CCATAACAGGTAGATGTGTTCCTTGCTTTAAAATAGGTACAGCAGAGATAT 
CTAATGTATTTCTAGTTGCTTTTTCGTATGTACGAATACCTCTCTCACATA 
AAATAATATTTCGGTTACCTTGTGAAGCAATATATTCTGCTGCATAGATAA 
ATTCTTCTATCGTTGCTGATAAACCACGTTTCAATAAAATAGGTTTATCTG 
55 TACGACCAGCTTCTTTGAGTAACTCAAAATTTTGCATATTTCTAGCACCAA 
TTTGGAAAACATCCAAATAATCTGAAGCGATTTCGAAATCATTTGGGTTAA 
CTATTTCACTTACTACATTTAAGTTATATTTATCTTTAACATTTTTCAAAA 
TTTTGAGACCTTCTACTCCAAGACCTTGGAAATCATAAGGAGAAGTTCTCG 
GCTTAAACGCTCCGCCACGAATAAATTTCTCGCCTTTAGCCTGCAAATCTT 
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GTGCTACAGCATCAACTTGTTCTTGTGATTCTACAGAACAAGGACCAAATA 
CAAATGATTTATTACCATCACCAATGATTCCACCATTATCAAATTTAACGA 

TCGTATCTTCAGGOTTTCAATl^ACGTGATACATAAAGATGTTTTTCA^ 
CTGATTTTTGTAAATCTGTAGAGGCTTTAAAAATTTCTTTAAATAATTGCT 
5 TAATCACATTGTCGTTAAATGGACCTTTATTTTTATCTAATAGCTGATTAA 
TCATTTCCTTTTCACGCTGAGGATCATATACTCGTGTACCTTGCTTAATTT 
TTTCTTCACCAATTTTCTGTGCTAATTCTCCTCGCTTCGATAGTAATTCTA 
ATATTTGATCATTTAAGGAAACAATTTCTTCTCTATACTGTTCTAATTTAT 
TAGTCATTTCTACTCACCTCTT^GATATAGTTACGCATAAAGTATGTTACT 

10 TTAAAGCGATAAAATTAAAATTATTCAAATCTAATTTTAACAATTTCACAT 
TTATAGTTCAAGCGTCAATTTACAGAAAATACTAAATCTTCTGTAAATTTA 
TGTAAGTAATAATTTGTTAAATAATGATATTACAATATTATTTAAATGTTA 
AAAAAGATAGGAACGAACTGAATATAAATTCGTCGTCCTACCTCAAGATTT 
CAGATATTATTTAAATATCTAATTTTAAAGTGAAATAATCTCTTAAATTAA 

1 5 TGACTTAAATTTAAAATTTAACATAATAATTACAATAGTTAGTAAATCGTG 
TTAAGTAGATAAAGAATATAACTTTTATAATGTTCAATACACTTCATTTTA 
CTCTTTATCATTGTTAATATTTAAAACTCGAGATTAATCATTAAATGTTCG 
TTTTTCAATTTTGCTACTAGTTTTTTCTACTTTTTGTTGTTTATTACGTTT 
GTTTTGGTTAGGTGTATGTTGCTGTTTGCCTTTGGAATCCTCTTTAGTTGC 

20 TGTTATATTCTTAGTCTGATTCGAATTTTTTTGGCTGTGCGGATTATTTGA 
ATGTTTTTTAGATTCAGTAATAAGTGTAGGGATACGCTTCAATTTTGTTTG 
AGCTTTAGCAATTGATGGTTCTTTTAACAACGCTTCTGTAAGTTGTGTTTC 
TTTAGAGCCATTTGTGAGCTTAGATCTTTTTTCTTTTGCTGCCTGTGCTAA 
GCGATAAGCAACTGAATTAGTATCTGATGCTTGAGATTGATTTAAATTAGT 

25 TACAAGATTTGGAACATCAGAGTTATTTGAAGCTTTTATAGCTTCTTCTTG 
CATTAGATTTTTCGTGTTACTAGCAACTTCATTATTATTATTTATTCTATC 
TTTTTTTGCTTTTGCCGCTGCAGCTAATTTTTCTGTTTCAGTTTCTGACTT 
ACTTTTAGTATCTTCTCCACTTGCAGAAGGGTTTGCTAAATCATGATTGCG 
TGCTTCTTCTTTAACTGCCGTTTGAGCAATTGCAATTTCTTCAGCAGAAGG 

30 TTCTTTCTCACTATCTATGTGCGTCGTAGATGTTTTATTGTTTGAATCAAC 
' TTCACTTTTAGCTTCTTGAATTGCCTGCGCTTCCGGTGATTCATCCGCTAA 
GTCTGTTTCGTGTTGAATAGCATTTTGTTGAGCTGCTAATTCTGTTGCTGA 
TGGTCTCACATCATCCACTTGTTTTGTTTCTTTTTTTGCTTCTTGAATTGC 
TTGCGCTTCCGGTGATTCATCCGCTAAGTCTGTTTCGCGTTGAATAGCAGC 

35 TTTTTGAGCTTGTAATTCATCTGCATGGATATTATCTTTGGAATCATATTT 
TAATTGCTCAACTTTAGATTTAACAGAAACCACTGTATCTTCCGCTCGTTG 
TCTATAGTTTTCAACTTTACTACGTAAATCTTGTTCTTTTTCTTTCAATTC 
ATCCGTTTTTTTATATATTTTATTTTTATAGTACAATCCAACAGCAGAACC 
TACTATCGCACTTGCTATGAAACTAACTACAAAATCCTTACGCTTCAATGA 

40 GTTAGATTGTGAACGAGATGTGCGATGATACAATTCTTCGCTTGTATGGCT 
TTTCTCGAAATTATCACGATTATAATGTTCCATAAACTTTCCTCCTAAAAC 
AAAAGCACTTAAATAAGTAAAAGCCTCATTTAAGTGCATTTGTCATTAAAT 
AGTATTAATTATTTATTAACACGTGAATTGTAGCTATGATTTGCATCATCT 
GCTACAGTGTTTGTTTTGTAGTTTGCACTTCCTCTACGATTGTATCTGTTT 

45 TGCCATTTGTCAGCAATTTCCATTGCAACATTTGACeATTGTACAACTTGA 
GAAATTTTATCTTCATTTTGAGAAATATTGTGCGTTU^TTGAGTTTGTAACT 
CTATCTACAGATCCATTTAAGTTTTGTACTGAGTCTCCAATGCCCTTAACA 
GCATCAACTACAGAATTTAAACGATCAACTTTACCTTGAATATCTTCAGTT 
AAGCGATTCACTTTATGAAGTAAATCAGTCGTTTCACGAGTAATACCTTGA 

50 ACTTGACCTTCAACACCATCTAGTGTTTTAGCCACATAGTCTAAATTCTTT 
TTAACAGAAATAAGTACCACGACGATACCGATACATAAAATTAAAAACGCT 
ATCGCAGCGATGATTCCTGCAATTGGTAAAATCCAATCCATTAAAAACGCC 
TCCTAATTACCATAAAATAAAAGTCATTAATATTTAATACCCATGATGCTT 
TTATATAAACATATTAAAAATCATTTTTTACGCCTAATTTTTCGAAATAAG 

55 CTTTTAGTAACTTTTGAATGTCACCAGCGCCCATAAATAAAATTACAGCGT 
TATCAAATTTTTCTAATACATCAATACTATTTTCATCTATAAGCGTCGAGC 
CGTCAATACGATTGATTAAATCTTCTATAGTTAAATCTCCCGTATTTTCTC 
TTATTGAACCGAATATTTCACATAAAAATACTTGGTCT 
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Sequence 3424 

step . 1002h09 . cons . ok 

GAGACGTCCAAGTTGAGGATATGTTTCAGTATTACCATGAGTCATACCCAT 
5 ACCGCCACCGATAGTCACATTAAATCCAATTAACTCATCTTGTTC7VACGAT 
AGCGATTAAACCAATATCTTGAGAATAGACGTCAATATCATTAGATGGTGG 
TACTGCAATACCTATTTTGAATTTACGTGGTAAATACGTATTCCCATAAAT 
AGGTTCCTTTTCCTCACTCGAATCTAAAACCTTTTCACCATCAAGCCAAAT 
TTCATGATATGCATTTGTTCTTGGAAGTAAGTGATTACTTATACGCGTTGC 

1 0 AT AATCATTAATCTCCTTATGTACTTGAGATTGATAAGGATTAGGATTGCA 
CATCGTATTACGATTAACATCTCCACATGCAGCAATAGAATCAAGTACTGC 
ATGATTAATATTTTTCATTGATTGTTTCAAATTACGTTTAAGAATTCCATG 
AAATTGAAATGCTTGGCGTGTTGTTAATTTAATCGTATGATTTGCATATTG 
ATTAGAGATATCATCCATAGCAATCCACTGTTCAGGAGTCGCTTTCCCCCC 

1 5 AGGTACACGAACTCGAATCATAAAACTATATGCAGGCTCAAGTTTTTGTTT 
ACGACGCTCATCTCTTAAATCCCTGTCATCTTGCATATAACTTCCGTGAAA 
CTTTAGCAGTTTTGCATCATCTTGTGTAATGGATCCAGTGATTGGATTAGC 
TAAACTTTGTTCAATAGTTCCACGCA7VAAAGTCACTATTTGCTTTTAAAAA 
TTCCATTTCATCAAGATTTTTATCTAATTCTTCCGAAATATGATTATTTGT 

20 ATTAACCATGTATCCCCCGTCCCTTCTATTAATACACGTCTCTTTGATATC 
TTTTATCTCTTTTCATTTGTTTTAAGTATTCTTCTGCATCTGTTTCAGATA 
GGTTTTGCTCTTTGATTAACACATTTTTAATCGCTTGATGAACATCCTTTG 
CCATTTTACTTTCATCACCACATACATAAATAGTAGCGCCATTTTCAATCC 
ATCGATTAAATTGTTCACTATTTTCTGCAATTTTATGTTGCACATACACTT 

25 TTTTATCAGTATCTCTAGAAAAAGCAACATCTAATTTTGATAAAGTTCCAT 
CTTCAAGCCATTCTTGCCATTCCGTTTGATACAGAAAATCTGTAGTGAAGT 
GTTGATCTCCAAAGAATAACCATGTATTTCCTTCA7VAACCTAGTTCCTCTC 
GTTCTTGCATATAGGATCTAAACGGTGCAACACCTGTCCCAGGACCTATCA 
TAATCACAGGTGTTGATTCATCTTGCGGAAACTTAAAATTCGGATTTCGTT 

30 TTAAATAGATAGGAATTGTATCGCCCTCTTGTATTCTCTCTGCAAATTGTA 
CTGAACAAACACCTGAACGTTCCCGACCGTGTGCTTGATATCTAACTGCTC 
CAACAGTAATGTGAACTTCATCTGGTGTTGCTTTATAACTACTAGATATTG 
AGTACTCTCTAGGTGGTAACTTTCTTT^TAATTGATGTAAATTTTCAGGTT 
GTAGTTCTGTCGTTGCGAAGTCATTTAATAAGTCAATCAAATCCCTTCCCT 

35 CAACGTAGTTTTGAATCCATTCTTTATCTTGAATTTTTTCAGAAAGCTCTT 
CATTATCAAAAAATATCGCAGCATTTTCTATTAACGGTTTTGTTAATTTAG 
TAATTTCAAAATGCGATGTTAATGCCTCTTCAAGATTTAAAGTATCTCCAT 
CCTCGTTAATTAAAACTTGTGTTTCTGGACTCCAACCTAATGTGCTAATAA 
GTAAGTCTACGATAGCTGGATCATTTTGAGGCAAGACAACTACACAATCTC 

40 CCGGTTCATATTCCTCACCAAAATTATCAAGTAATAACTCAATATGTCTCG 
TCTCTTTGTCTGAACCTCTACCATTCAAATTAATATTGGTCAATACTTCAG 
CATCGTAAGGATTAGACTTACTATATTTCTTTTCTTTTGCTGATTTTATAG 
ATTCGCTAACAACTTGTTCACTTTCAGTACCAGCCGGTGTTGAATCAATCG 
TATTAATAACATTGGCCATCCACTTTTCAGCGTCTTCTTCATAATCAACAT 

45 CGCAATCAGTACGATGATAAAGTCTTTCTGCCCCTAATTCAGCCAAACGGT 
TATCAAAATCTTTACCAGTTTGACAGAAGAATTCATAGGTTTGATCTCCTA 
ATGCTAATACTGAAAATCTCACCCCATCCAATTTTGGCGCTTTACGTCCGT 
GGATGTATTCATGTAATTCAACAGCGTTATCTGGTGGATCTCCTTCGCCAT 
GTGTAGATGTGATAATAAATAAATCTTCTACTTTCTTCAAATTCTTAGGTT 

50 TAAAGTCATCCATTGATTTGAGCGTAACGTCATTTCCAATATCAGATAAAC 
GTTGTTCAAATATTTCTGCAAGTCCTTGCGCATTACCAGACTCAGATCCAT 
ATAAAATTGTTATCGATCTAGCTTCCGGCTCAACTGAAGGTTCCTTTTCAT 
GTAGCATCGCTTCTGTATTGTCATCTGAGTTATGTTGTTCTACAGAATCTG 
TAGATGTATTTGATTGTTGATTTGCCATCAGATAGCCACTTAACCATACTT 

5 5 GTTGGTTAG ATG ATAAAGTGTGAAGCAATTCATTGATTTGTTTAGCTTGCC 
CTTCCGTAAAAGGACTATTGGTAACACTTAAGTTCAATCTTATCCACCTCT 
TATACTTTGTTATTCCGATATTTGTCATTATTTATATATGTCTATTTTTGA 
AAGTTATTAATTATATCTTCAATTCTAGTATGTACCATTTGTTATTTTCAG 
TAAATGTAATACTCAATTGATAACTAAAATCGAAAAGACAATTCATAATAT 
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CAAAGGGCAATACAAACTTTTATTACTCAAAATTATCGTTACTAAATAAAG 
CTGTATCAGCTTTTAACAAACAACTCTTTATTTCITATCGGAATACTATGT 
TTTGAAATTAAAATTTTTTGACTATCTCTTATGGTTTATCAGCT^ 
ATCCACATTCAGTCTTACTAGAATTGGACCAACGACCAGCACGTGAATCAT 
5 TAGAATCAAATACGGGTGATGTACATGGAATACAACCAATACTTGGATAAT 
TTTGATCATGTAATTCATTATATGGTAAATCCTTATCACGTATA 



Sequence 3425 

10 step. 1002hlO. cons .ok 

TAGGTAAACAATGATTGATACTTTGATTTGTTCATTGCTTTATCATTTAAA 
CAAACATAGTGATTTTAATTTTTATCATATAAAAGAATGAAGATAAAAAAG 
ATGTATGTGAAAAGTGTGCTCACATACATCTTTTTATTAATATACATCTTA 
GTTAGCACCAACTGCTACTATAGAATTTGATGTGGATGCATCATCGTAAGA 

1 5 ATCAATTACTTTG AATTTTTTACCGTTAACTGTAAATTGTTTGTTTTTAGT 
ATTGTGATATACCCAAAAGCCATTTATTTCAGCTATATCCCTATCATTAAT 
TTTTAAATCCGTCATGGTCGAACACCCTATACTCATTAATTTGATTGTTAC 
TTATGTTTGGATCGTTGATTTTTAATTTGTTATTGCTAAATAATAAGTGGA 
ATGTGAAATTATGTGGAATAGATGGCTCTTTTCTCATTTCTTTAGCCATGG 

20 ATAAAGTATTTTTTCTAATGATTTTTTCGTCATATTCGTCCAGAGTACTGA 
AAAGTGTAGTCGTTATTTCTGTATTAATATACGTATCTTTTTTAGACTTTA 
ATTGTTTCATATGCTTTTTAAAATCATCGTCACTTTCTTTAATTAAAGGTT 
CAAAATGTTTTCTATAATCTCGTAATTTCGTGGAACTAGTAGTAATGTAAA 
AGTAATTATTTTGATAACCTATATTTTGAGTTTTAGTTATTGCTTCATCGG 

25 TGAATCCAGTATAGTGATATTTATCTTCTTTATCTTTGAGGTATTTTTCTA 
AGTTATCCATTTTTTCCTTATTTGCTCTATACTCAAAACCGCTTAACACAG 
TTCCTACCATCATACTCATAGTATCTCCGTTATCATTGCTTCGCATGGAGC 
CTTCTTCATGGATTGATTCCATATCAAAAGGGATGCTTGCATTAAACACAA 
TGTCATGATCATCGCAATGAACGTAAACTTCGGCGCCATCACCACTTCCTA 

30 CAACATTTGTGGCTTTAACATTAAGGCTAAAATTATCTTTAAAAAATTGTT 
CGCCAAGTTTTTCGTATTCTTCACGATATTTTTTCGCATGTGCGATAGCAC 
TTTTTTCAGCTAATGGTTGGAAATCTTGACCTTTGTATTGACTTGCAGGAA 
CTTTTTCTATTTCGTCTTTCATTGTCGAACAACTTCCTAATATTAGTGTTG 
AAACGATAATTGTGGCGATTAATTTTATCATGATACACCTTTCTTTACTAA 

35 TATTACATCTTAATTAATATGAAATACATTATTTAAATCAAGCCCAAAAGC 
TATTTAATTCAGCAATATCTCTATCATTGATTTTACTCATGTTTATGCACT 
CCATATTCTAATGGGTTGTTAGCATCATAAAATGCGCGTGTAGAATTAATT 
TTTGGCTCACTTAATTGTAAAGTTATGTTTGTGTTTTCAGGTTTTTCTTTG 
TAATGATATAAATCATCACTTAAGAATAATAAATCTTTCTTAGAGTTATTT 

40 TTAGTGAAATCAGATTTAGTACTGAATAATGTTGAAACAACTTCAATTTTG 
GCTTTGAAATTTGTTGCTTCATTTGCTTTTTTCATGCCTTCTTTGAAATTC 
TTAGAATCTTTTTTTATTAAAGGTTCATAATATTTTCTATAATTATTAATA 
TTAGTAACGTTACCTGCTAAATAATAATACTCATTTTGATATCCAACATTC 
TGTGTCTTATGTATTGCTTCTTTAGTAAAACCGGTATATTGATATTTTTCT 

45 TCATTGTCTTTAAAGAACTTTGTTAAGTTATCATATTTTTCTTTTTGTGCT 
TTATATTCAAAACCACTTAACACCGTTCCAACCATATTACTCATGGTATCT 
CCGTTATCATTGCTTCGCATGGAGCCTTCTTCATGTATTGCTTCCTTATCA 
AAAGGGATGCTTGCATTAAACACAATGTCATGATCATCGCAATGAACGTAG 
ACTTCGACGCCATCACCACTTCCTACAACATTTGTGGCTTTAACATTAAGG 

50 CTAAAATTATCTTTAAAAAATTGTTCGCCAAGTTTTTCGTATTCTTCACGA 
TATTTTTTCGCATGTGCGATAGCACTTTTTTCAGCTACTGGTTGGAAACCT 
TGACCTTTGTATTGACTTGCAGGAACTTCTTCAGGAATTGATTGTGATGAA 
TGTTTTGCTTCTTTATGCACTGTGGAACATCCTCCTIATTAATAATGTTGAA 
ACGATAATGGTAGATATCAGTTTTCTCATGATACACCTCAATATAAGATTA 

5 5 GTTACTTTAATTTTAACTTAATTTATAAATTATATAAAAAGAATTTATAAA 
AGTATTGAACAAATATAACATTTTTCACATATTAGTTATATTGATTTTATA 
GCAATTTCAAACATATTCGAATAAAAAAATAAAAAGCTTACACAATGTAAT 
TT6ATACATGCATAAGCTTTGTCATATGATTAACGATTTTTATTCTGTTTA 
TCATTTGTTCCATGATTTTGATTGTTGGAATTATTATTTGAGTTGCCAT^ 
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TGTTGTGAATCAGATTGATTATTATGTTGTGACGACTGTTGTGGTGTTGAT 
TGTGCTGGTTGATGATATGATTGATTAGTTTTGTTCTGATTCGGTTTCTGT 
TTGTTATTTTCTGAAGAGTTATTTTTATTTGAACCAGACTGTTGATTATTA 
TTTGGGTATTGATGGTAGTTAGGTTGTTGTGCTGGTTGTTCAGTACCTGAT 
5 TGATTATCGTTGTTATAGTCATTATTTGTTGGTTGAGTTTGGTCTTGGTGT 
TGATTAGTATCACGATTTTGATTTGAAGTATCATTTTGTTTGTTTTTGCGA 
TTGTGTTCTATTTTTTTTATCATTCTTATTAGAATTAGTATGGTGGTTAAT 
AGCTGCATATAATGAAAAGCCTAATGCACATATAAGAATGATTAC 

10 

Sequence 3426 

Step. 100 2hll . cons .ok 

TAACCCAGCCACTCGAACCATAAAGAATTCATCTAAATTTGAACTGAAGAT 
AGAAATAAAGTTAAGTTTTTCAAGCAACGGATTATTTTTATCATATGATTC 

1 5 TTGTAATACTCTGTAGTTAAAATCTAGCCAACTTAACTCCCTATTGTTGTA 
ATACTGCGGTAAATTAATATCTTTTTCTCCCAATCGAGTTTGCATACTATA 
CATACACCTCATTAATTAATCTTTATTATGCCCAATCTAAAATATCATAAT 
AATTTTAAGATTTTGTAAATATAATAGAAACCTTACCTTTTAAAATTTTCT 
CAATATGCTTTTTCTGACGATTTGCTTGGTATTCTTCTGCAATAGGTGAAC 

20 CTTTGTAATAAACTAATAAATCGTATTTGTCATCTTTCTTTGCTTTTAGTT 
TT^CTTCCTCTACAAAACTAGTATGTGAGATATTCAAGGTGTTTGCAAATT 
TAATAATCCCTCCTAAAGCTTGTATTGTATCTATTTCTTTATTACTAAACC 
ACTGTGTTTCTTTGCAATAAAATTTAAGTAAAGATTTGTTTTTAAAACTAG 
CTAACAAAGCTAATTTCACACGATCTTTATGTGAAAAGCCGTTAATCATTG 

25 AATTTGCGATTAAGTAATACGTATGTGGTGAACTTGAGTCTGAATCAATGA 
AACTACCTAGATAATAAATGTAGGCACCTTCAATAAATAATTCTTTTTCCA 
TTTCTGAAATATTAAGTGATCGTTCACTTATAATTTGATTCAATAAGGATT 
GAGCTAATTTCACACGACGATTAGCACTCGTTTCTTCAATATGATATTCGT 
TCGCTAAATGACGTAATGCATCTTTACGTACGTTACTTTTATTAAACTCAT 

30 CAGGATATCGTTTGCTGATGTGGTTCATAATAAATCCTTCACGAATTCCTT 
TTCTTGAGAAGGTGAATTGTGTGGCGTCAATTTTTTTGAAAAGTGTTTTAA 
AGACGGAGATAGCCGGCAGAATAATATCGACGCGATCGCGACTTAAACCGT 
CTAAATTTGTTAGTTCATCGCGAGAACTTTTACGGATTAGATCATAAACAT 
TGTTAATATCTTTCGAAGTCATCGTATAGTTATGAACGCCACCGATAGGGT 

35 ATGCATGTGCTGATTGATGAATGCGTGCAACATTACGTGCAGAACCACCTA 
CTCCTACGAGCGCAATATGTTGGTTGGATAGCCAGTCTAACTGACTAAATT 
GCTCACGTAAAAACTGTTCCATATTTTTAATGGCTGTTTTGTCATTATGTT 
CTTTATCACCAAAAAACTGACGCTTAAGTGATACCACGCCAAATGGAAAGC 
TATGAGCCTCTTTAAGTTGTTTGTCTTTGAAAAGGGTAACTTCGGTAGAAC 

40 CGCCACCGATATCGACAGAAATTCCATTTTCAATATCAGTAGTATGTGTAA 
TCGCATAGTAACCGTAAAATGCTTCATCTTCTTCAGGTACAATTTGAATTT 
CGATATGAATATCTTGTTTAATTTCTTTAATGATAGCTTCACGATTTTTAG 
ATTGACGGATAGCAGCTGTTGCGATGGGATATAATGCATCAACGTTAAATT 
TATCCGCAACTTTTCTAAAACTGCTTAAAGTCTCTTTTAAAACATGAATAC 

45 CTTCATCATTCATTTCATTGGACTTAGTGAGATATTGACTTAAACGTGCAG 
GTGTTTTTATATTCAGTATTOCATTGAGCCCAGTTTTTTTATTGTAGCCAA 
ATATAACAAGTCGAATCGTGTTGGAACCAATGTCTATCAAACCAATTCTTT 
CTTCCATGTTTGCCTCCTATTATTCAAACAATGCGTTTAAAATAATCCCTA 
TTTAATTTTATTGTACATGATATAGATAAATTGAAAAGAACTTACAGTTAA 

50 TTTTTCTGAAATATTTAAGATGATTGATAAGTAATTATAATTTTTAAAAAT 
ACAATTTTAGTAAAAAGAGAAGTATATAAATGTATAGCTTAACTCAGCTAT 
TCATTCTTCGTGCGTCTTATTCTTATCACTGTTTAAGTTGATTCACATCGT 
GGAAAACTAAAACTAACATGATACCTAATTGTATTAAGTATGTACAAGTGA 
TAGGAGAGTGACTTTGGATGGAACGTTTAGAAAACAAAATCGCAGTGATTA 

55 CTGGTGCGAGTACTGGTATTGGACAAGCATCGGCCGTGGCGTTAGCAAAAG 
AAGGAGCACATGTGTTAGCGCTTGATATATCAGATCAATTAGAAGAAACTG 
TGCAGTCTATTAATGATAATGGTGGGAAAGCAACTGCATATCGCGTAGACA 
TTTCAGATGATAAACAAGTCAAACAATTCTCAGAAAAAATAGCACAAGAAT 
TTGGACATGTAGATGTTATTTTTAACAATGCGGGTGTAGATAATGGCGCCG 
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GACGTATTCATGAATATCCAGTTGAAGTGTTTGATAAAATTATGGCTGTTG 
ATATGAGAGGAACTTTTTTAGTAACTAAATTTTTATTACCTTTAATGATGA 
AACAAGGTGGTTCTATTATTAATACAGCTTCATTCTCTGGGCAAGCTGCGG 
ATTTATACCGTTCAGGGTATAATGCTGCTAAGGGCGGTGTCATTAATTTTA 
5 CAAAATCTATCGCTATAGAATATGGACGTGAAAATATTCGTGCTAATGCTA 
TAGCACCTGGAACAATCGAAACACCACTTGTTGATAATTTAGCAGGTACAT 
CAGATGAAGAAGCCGGACAAACATTCCGAGAAAATCAAAAATGGGTAACAC 
CATTAGGTCGACTAGGAACACCGGATGAAGTTGGGAAACTTGTAGCCTTTT 
TAGCTTCCGATGATAGTTCATTTATAACTGGTGAAACTATTCGTATTGATG 
10 GTGGCGTGATGGCTTATACATGGCTACAACACGCATTTTCATTTTTTGTCG 
TTGTTTTTTCTTATTTTCTTGTGACGTATATTGGTTACCGATATGTTCTAC 
TTTTTTATTCATTGCTTGTCACCTCCTTAAGCATTTCACTCTTCATTAATA 
CGTTCTTCTTTAATC 

15 

Sequence 3427 

step . 1002hl2 . cons . ok 

TGTTCTAAATTAGCAATAACACTTTCAAGAGACGCACGTGTAGGATTTGCA 
GTACGTGAATATTCGTACCCTTGTCTTAAATCACCAATATCATCTTGT7UVA 

20 TAAGTACTTGTTTGATAAATAGGTGTTGTCACTGCTCCAGTATAGTTGTCT 
GTCGTATGTCCCCCATGTATCATTTGCGTTTTTTTATTCATTATTAAAACT 
CTCCTTATAGTTGAATATTTGTTTGGACATGTATCGATCGCTTCCATCTGG 
AAAGATGGTAACAATCACACCATTTTGAATGCTTTTTTTTAATTCCAAC6C 
TCCTTGTAATGCCGCTCCCGAAGAACTACCAACTAACAATCCTTCTTTATT 

25 CGCGACAAGTTTAACATTATTAATU^GCATCTTTATCGGCAACAGTAAAAAT 
ACCATCTACTAATTCTTTTTCTAAAAATGATGGCCACTTTTCAGAACCAAT 
CCCTTCAGTTGCATGAGGATGACTGACACCACCATTTAAGACAGAGCCTTC 
TGGTTCTACAATATAATTTTTTACATCATACGTTTTTAAGTGTTGTGCAAC 
TCCTGTAAACGTACCACCGGACCCAACACCTGCCACAAAATAATCAATATG 

30 TGAAAGTTCATCTGTGAGTTGTTTGGCAAGTGTTTGTGTATATGCCCCAGG 
ATTATCTTTAGTTTCGAATTGATTCATATATAAATATCCATATCGTGTTGC 
GTATGCCAACGCCTCTTGCTGTGCGCCAGTCATTCCCTCAGCTTTGGGGGT 
ACGTCTAACATCTGCACCCAATGCTTTCATAATTGAAATTTTTTCTTCTGC 
AAATCCTTCTGGAGCAAAGATGATACATTTTACTTTGTGCCGATTAGAAGC 

35 AATAGCAAGTCCAATGCCTGTATTACCAGCAGTCGCTTCGACTATTGTATC 
CCCTTCTTTAAGTCGTCCTTCATCTATTGCTTTTTCAATTAAGTACTTCCC 
TAGACGGTCTTTGATGCTACCACCAGGATTAAATTGCTCAAGTTTGGCGTA 
TATTTTAACATTCTCGTCACTAAAGCTTTCTAATAAAACTAATGGAGTTTG 
TCCTATCAAATCGTATGCAATCATTATTTTTGGTTCAGATGACAACAATGA 

40 GTATCACTGAACTTAACCCACTTTCTTTATTTATGTTAGCGCTAATTCTTA 
CTTATCAGATATGAATTATAACTTATCGGCATTGTTTTTGCAATTTCTGAA 
AAACGTACAATCAATTTGAAATTTAAACATTATTTAAAATAAGTATTTTTA 
TAATTGATTACGTTTATATGAGAATACATACCGAAAGTGTTATGATAGGGG 
TAAATAAGCTAGCACATCGATAAAAGTTATCGAATTACATACACTAGAAAT 

45 GATAAGGAGCTGCTTTAATTATGACGAAACAAGATTTATCTTTATCTGTAT 
TTACCAATGAAAATTATAAGAATCTTCGTTACACATCATCTAGTTTTAGAA 
ATTCTATGTATGATGAATTAGAGGTTAATAAAAGTCGTTTTAAAAACTGTA 
ATTTTAATGAAGGTATTTTTAAGAATATAGAAGCAATTTGTAATTGTAAAT 
TTACAACGTGCGGGTTTAATAATTGTATTTTCGAAGATGTTCATTTTTACA 

50 AAAACCAATTTAAAGATTCAACATTTGTGAATACACCATTTGATCAATCCG 
TATTTAATAGTACTTTATTCCAAAATGCAATGTTCGATAGCAATCTCATTC 
GTAGCGTAAAATGGACTGATATCATTTTTAAAAACGTTTCTTTCAAAAATG 
TAGAAATTGAAGGAACAACATTTAAAGATGTAAAATTCAAAAATTGTGAGT 
TCAAAAATGTAATTATTACTAATTCAACTATGTCGCAAAAGTTAATGAATG 

55 AATTACAAAAACAAGATGTTACTTTAGAAAATATAGACACTTCTATTTAAC 
ATTCATTATCTTCCGCTTAAGAAAACAATTTAAGATCATAGTTAAACATTC 
GCTCGTTTAATGGTGTTAGATATTTCGAATCAACTACATAACATTCTTTAT 
ATAGTAAAACGGTCAACTCCACTAAAAAATGGAATTGACCGTTTGTTTTAG 
TTTTATCAATAACGTCTCAATGACTTAAAATAATTGAACTATAAATACTAA 
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AATAATAACAATAGGCATGATGAACTTAATTAAATAATACCACGGTAAAAA 
TAGGTTGAATTTGTCTTTCCCAAAGCTTTCTTTTAATAATTTTTTATCTAG 
TAATTGGCCAACCACTAATGTTGTTCCTAGTGCCCCTAATGGCATAAGAAT 
ATTAGATACAATAAAATCCATATTATCAAATATCGTACCAGCGCCAAAACG 
5 CT^TGACTTAGACTACTAAAAGATAATGTTGCTGGGATACTAATGATAAA 
TACAAGTATACTACCTATGATTGCCACTTTTTGTCTTTTACTATTATCATT 
TTTAGTAAAATTAGATACATTTAACTCTAATAATGATATAGAAGACGTTAA 
TGCCGCAAATAAGAATAATAGTAAAAATATAAAGTAAAAGAATGTACCAAA 
AGTCATTTCGCTAAATACTAGTGGTAAAACCTTAAATAATAAGCCAGGGCC 

1 0 TTCTTGGGGTTGGTAACC AAATGTTTTAAGCGCAGGAAATATAGCTAATCC 
AGCCAAGACAGAAATTAAAATATTCATTACGACAATTGAAAGTGCTGAAGA 
CTTTATCGTCATATTTTTAGGTGCATAGCTTGCATAAGTAATCATTCCGGT 
TGTACCTAGGGACAGCGTAAAAAACGATTGTCCTAACGCAAATAGTACACC 
TTGAATAGACATATCTTCAACTCGAGGTTGCAGTATATAACGTACACCTTC 

1 5 TAAAGCACCTTCTAAAGTTAAAGATTGTGCTACAACGATAATTAAAAAGAT 
AAATAATAGAGGCATCATTATTTTAGAAGCTTTTTCTAAACCTTTTTCAAC 
ACCTAACATAACAATTACCATTGTTATCAAAATAAATATGCCTTGACCTAA 
AACAGTCAACCATGGATTACTAATGATTGTTTCAAATTGAATATTTGTCAG 
CGTACTTGATTTAAAAACCATGATTTGTGCTATGACATAACCTATATA 

20 

Sequence 3428 

step, 1003a02 .cons -ok 

AAAATCAAATCCGGTTTAGCTTTTATTAATTCTTCTTTATTTAAATTCATC 

25 GCATCGAATTGTTTTTTACCTTTTTTTACATCTTTAGGATAATCATCCACA 
GTGGATACACCAACTATATCTTCTCCGATTCCTAAGCGATATAAAATTTCT 
GTGTTACTAGGAATGAGCGAAATAATTCTATGATATTTTTTATCGCTTTTT 
TCAGTGTCTTTATTATTAGATGAATGATTGGCATCTTGACCACATGCAGCC 
AGTACTACAATCAATGCACATATAACTATCCATAGTTTACTTTTCATTTGG 

30 AGCATCACCCTTTCACAAATTATTTTTAATGTTCCTATTCATCATATCAAA 
ATATACCAATATTTGTAGATACAATTTTATTTGATTTTCACAAAATATGAA 
AAAAGACAACTTTGAATTAAATCTATTAATATATTTAACCCCAACTTGCTT 
TGTCTGTAGAATTTCTCGTTGAAATTCTCTGTGTTGGGGCCACAACCCCAA 
CTTGCTTTGTCCGTAGAATTTCTCATTGAAATTCTCTGTGTTGGGGCCACA 

35 ATCCCAACTTGCTTTGTCCGTAGAATTTCTCGTTGAAATTTTCTGTGTTGG 
GGCCCCGACCCCAACTTGCCTTGTCTGTGGAATTTCTCGTTGAAATTTTCT 
GTGTTGATCAGTTTTATAATGTTAGTATCAATCGGTAAAATGCTAAAATGT 
CTGAGACAATCAACAATGTCCCAGACACATAATGTTTTATAATTTAATAAA 
TGTTTTTATCGCGTCAGAAATGTATAAATTCAAACCACGTTCACGTTGCTT 

40 GTTAATAAACTTTACAAATCGCTCGTCCTCAATATAAATCTGAGCTATATA 
TTCTAAAAATTGATTGTCACAATTAGGAACTTGTTCTTTCAAAATACATTG 
TAATTTCTTAGTTTTTCCACTTGCTTCAAGTATAGAAACTTTGTTTAAGTA 
TAGTTGATTCATTTCGTCAAAAAACATATTTAATTGCTTATTAATTTCCTC 
AAAATGATTTTGCTGTTCTGATTCATCCTTACATTTTTGTTTATCTTTATA 

45 TGCTTGATAATAATGAGTATCTCO^TACTTACTTGCTGCTTCTTTTTCATA 
CTGTTCATTTAAATTGAAATTTTGCATTGTAGTTACCTCAATCTCATCGTT 
ATCTTTGTGAAAATGATGTTCTAACGTACGGATAATATCGTTTAAGCGATC 
TCTTTTTTTAACAATGTGATAAAAATTATCATGTAATATTTTAGTTCTAAA 
ATCACTATCACCTTCAAAATACGTCTGTATTTGCTTTAATGTTAAATCTAA 

50 TTCCTTTAAAAAGAGAATATATTGTAACTTTGAAATATCTTGAACATTATA 
ATATCGGTATCCATTAGCACrCATTTGTTGTGGCACGAGTAACCCCTTTTC 
ATGATAATAATGAAGCGTGCGTATACTAACTCCGGTAATGTCAGAAAGCGT 
TTTGGGACTGAAGCGTTCTTTCATCATTTCACCTCCTGAATTAAAGTATGA 
GGTATGACGTTACGTTAAGGTCAATACTTATTTTTTAAAATAGGTATCTAA 

55 AAAGGTATAGCGTTGTTTCCCCTTACCATTAAAATAATTTTTCCCAAAATT 
ATCTAATAAAGCGTGAAACTCATTTGCATCTTGATTTGAAAATGATTCAGG 
AAGTGTTAATTCCTGTTTCAATTTATGATAACTTTCTGTATGTTGATATCC 
CAATTTTCTAAAAATACGTCTAGTATAACTATCAGGTATGAATTCTTTACC 
TTTAAAAATATATACTATTAAGACATCGGCGGTCTCTTCACCTATACCACG 
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GATGGTGAGTAATTCTTTTCTTAAGCTATCACCGTATAACTTAGCTATACT 
ACTATAATCAAAATGATGTTGATTTAACCATAGTAACAATGCCT6TATAGC 
CTTAGCTTTATTTTTATAGAAACCACTCGAACGTATCACTTGCTGCAACGA 
TTCTAAAGGCATTTTCAATATCGTCTGTGCATTAAAAGAAGTTTCTTCTTT 
5 TAATCTTGATAACGCTATATCTGCATTGTTCCAATTAGTATTTTGGACTAG 
AATTGCCCCAAGCATCATTTCTATTGGCGTTTCTGCTGGCCACCAATATTG 
AGGGCCCATATTGTATTTTGAGGAACAATAGGTATCTATAATGATTGATTT 
AACAATGTTTGTAACATGTTAACAGATTTTACAAAGACGGTAGCTAAATAT 
ATAATATGGTACATAGATGATGAAAGTGAGATGACATTATGGGAC6TAAAT 

1 0 GGAAC AACATTAAAGAGA7VAAAAGCCCAAAAAGATAAAAATACTAGTAGAA 
TATATGCCAAATTTGGTAAAGAAATATATGTAGCTGCAAAGTCTGGTGAGC 
CTAATCCAGAGTCAAATCAAACTTTAAGATTAGTATTAGAACGTGCAAAAA 
CATATTCAGTACCTAATCATATTATAGATAGAGCTATTGATAAGGCTAAAG 
GCGCTGGTGACGAAAACTACGATCACTTAAGATATGAAGGTTTTGGTCCGA 

1 5 ATG6TTCAATGCTTAT AGTTGACGC ATTAAC AAAC AATGTAAATCGTACAG 
CATCAGATGTACGTGCTGCGTTCGGTAAGAATGGAGGAAATATGGGAGTAT 
CTGGTTCAGTAGCTTATATGTTTGACCATACTGCAACCTTTGGTGTAGAAG 
GTAAATCTGTAGATGAAGTCTTAGAAACACTAATGGAGCAAGATATTGATG 
TAAGAGATGTAATTGATGACAATGGCTTGACTATTGTTTACGCAGAACCAG 

20 ATCAATTTGCACAAGTTCAGGATGCATTACGTGAAGCTGGCGTTGAGGAAT 
TTAAAGTAGCAGAGTTTGAAATGTTACCTCAAACTGATATTGAGTTGTCTG 
AAGAGGATCAAGCTATTTTTGAAAAATTAATCGATGCACTTGAAGACTTGG 
AAGATGTTCAAAATGTTTTCCATAATGTAGATTTAAAATAATGATGACTGC 
TAATGATTGGATAGACCGGTTGGAATTAATTTCGCATCCTGAAAGTTTCAA 

25 AACGATTTCGATCAAACTGTTTATTTGTGTCAAGAAATGTTCCATCCATAT 
CTACCGCAATGGCTTTAATCATCGTTAATCCCCCTATCTATAGATTAAAAT 
TCATAATCATTATATAGCTGTTGGATTTGGTATTGAAGCAATATCAAATAT 
AGTGAAAACATATTCATATATTGAAATTGTTTTCAACATAAAAAATAATTA 
CCTTTGTCATTCACACTATCGCTTAATATATAAGATAAATCTCTTATTTTC 

30 ATTTATATAACTTGAATCAAAAATATTAAAAAGAGCCAAGGACATTCACTT 
TGTCCTCGACCCTCTAATCAATAAGTTATTAAAATTTATTTTAACAAAGCA 
TCTTCTCGAAGAAGAACATAATTAATACAGTTTATTACGTATCTTTCACTT 
ATAAGAATATTAGTTAATTTGTGTAGCTATGAATATTTTCCTTGTTTCCCA 
AACACCCGTTTCTTCGTTATAGTCATCGCAAGTAATTAATGTTAATTGATT 

35 TTTCTTATTAGGATGTTCGTCTAATACCTTAACCTCTGTAGGCTTAACATC , 
ACGTATTTTAGTTATTTTATACTTTCTAGTTTGATTTCCAGTTTTAAAATA 
CACTTTACTACCGATTTTGGCTGATTTTAAATTTGTAAATTGATAGTGCGA 
ACGATCTGTAAACGTATGACCAGCAATTGAAATATTCTGTTGATTAAGAGA 
TTCGTCACCTTCTGCAAAACTAACACCTCTATTGAGTTGTTCTGGTGTTGC 

40 TGGACCAGGGTATACTGGTTCTTTTATTTGTGCATCTGGAACTTCTATATA 
ACCAGCCATTTTAGATTTATCGGAAGGTATCTTTGGCGTCGATTTAGATGT 
CTTTGTCTGTTCTTTTTCCTTTTTATCATAATTTTCAATTTTAT6ATCGTT 
ATCTTTTTCATGTAGATAATTATCGATATATGGCTTTGA 

45 

Sequence 3429 

step , 1003a04 . cons . ok 

ATAGTGTTTGATCTAGTGCACCTGCGGATCCAGTATTAATGACGACTTCTG 
GATTAAATTTTTCTATTAACAAAGTCGTTGAGATAGAAGCATTAACTTTAC 

50 CTATACCACTTTGTGTTAAAACCACCTCTTTGTGGTTTAGCTTGCCAACAT 
AAAATTTAACATGCGCAATATTTATTTCATTCATATCATTCAATTTACGCT 
TTAAAATCGTCACTTCTTCTTCCATTGCTCCAATAATTCCTATCATATATC 
TTTTTACACCTCATTTATAAAGCATTAGCATAAGTATTTTATCACATTTTA 
AATATCTCCACGATAGCTAGTTTTAAAGTATCAATGCTTAAATCACTATGA 

55 TTAAATAATATCTCTTCTTATTTATAATTAAAAACTACCCTGCAATTGATT 
TAAAGACAGTTGCGGGTAGTCATAGAACTTTATTATACAATTAACCTTTCC 
AAAGACTTGCTAAATCTTGTAATGAAAAGATACCAGTCATTATTGTGACAA 
TTACGGCTATAATACCAAAAACTAGCATCCATGTTGGGTGTTGGTAATTAC 
CAACGATAGATTTTTTCCTACTTGCAATGAGAATTGCACCTAATGTGATTG 
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GTAGAATCCAACCATTAATCGCACCAGCTATTATAAGTAAACTCACCGGTT 
TACCAATAAATAAGAAAACAAAAGTTGAAATTACAATAAATGTAATAACGA 
TAAGATTATTTTTATTGAGTAACGATTTGTGTAGTGTTTTTAAAAATGTTG 
OSCTTGTATATGCAGAACCAATTACTGAGGACATTGCTGCTGCAAATATTA 
5 CTACGCCAAAAATATTTTTACCTATAGGACCTAATGCATGTTGGAAAACTG 
ATGCTGGTGGATTTTCTGAACTAAGCGTAACGCCAGTTACAACAACACCTA 
GTACAGCTA/^AAACAATAAGGTGCGCATGACACCAGTTGTTAAAATACCTG 
CTACAGCAGATCGATTTACGAAAGGAAGGTATGACTTACCTTTTATACCAG 
AATCTAGAATTCTATGTGCACCTGCAAAAGTAATATAACCCCCTACTGTAC 

1 0 CACCAACTAATGTAATTATAGGTAATATAAGTTTGAAAGGATGTTCAGGTG 
CAAATGTATGTACTAATGCATCTCCATAAGGGGGATTTGAAACAACCATGA 
CATAAGCGACTACTAAAATCATTACGATACCTAGAATCATACTAATAACAT 
CCATTATTTTCTGACCACTTCTACTAACAAAGATAAGTATCGCAAAAATAG 
CTGTTATTGCAGCACCCCATTTTACATCAAGACCAAACATTGCATTTAAAC 

1 5 CTAAACCTGCACCAGCAATATTACCTATGTTAAATGCGAGACCACCAAATG 
CAATTAGTATGGAGATAATAGTACCAAGTCCAGGTAATACTTTATTAGATA 
TTTCTTGTCCACGTAATCCAGTTACCACTAATATTCTCCAAATATTTATTT 
GAGCGCCTATATCTATAATAATAGAAATTAATATTGCAAATGCAAAACTAG 
CATAAAATTGTGCAGTAAACACTGCAGTTTGAGTCAGAAATGCTGGACCAA 

20 TAGCTGATGTAGCCATTAAAAAAACTGAACCATATAATAATCGTCTATGTT 
TTTTTGTGAATTTAAAATTTTGTCCTTTTTGATTGAAATCTTGTTTTGTAT 
TTTCCCCCATCTTTTGACCCCCTATAAGGATTGAATATCAATGCCTTCTTT 
CATTAATTCATTTCTAATTTGCGTAACAAATTCTAATGCGTGTTTTCCATC 
TCCGTGAACACAAATTGTATCAGCTTTCAAATCGATGATTTTTCCATTTTT 

25 ACTTACAACTTTATTTTCCAAAACCATTTTT/^TGCTTGTTGGATTGCTTC 
GTCAGTATTAGTGATAGTGGCATCGGTTTTTTTTCTACTTACCAATTGTCC 
GTCATCTTCATAACGACGGTCAGCAAATACTTCCGAAGCTACCTTAAGCCC 
CACTAATTCAGCTTCCGAAATAAGTAATGTATTCGCTAAGCCGACGAAAAT 
TAGATTTGAGTCGAAATCAAAAACTGCTTGAGCAATTGCATGTGCAATTTC 

30 TTTATTTCTAGCCCCCATTTGATAAAGGGCACCATGAGGTTTAACATGCAT 
CATTTTTACATGATTGATTTTACAAAAGCCACTTAACGCACCTAATTGATA 
TATAACTAAGTTATAAATTTCATTTGGAGTAAGATCCATTTTACGACGACC 
AAAGCCTTTCAAATCAGGAAGACCCGGATGTGCTCCAATACTTATATTATT 
TTTTTTCGCTAATTGAACAGTTTCATTCATAACATCTTCATCACCTGCATG 

35 ATAACCACAAGCAATATTAGCAGAAGTTATTAATGGTAAAATGTGCTGATC 
TCCACCGAATGTATAGTTACCAAAAGCTTCTCCTAAATCACAATTTAAATC 
AATTTTCAACTTCATTTCACCTCTTTAATGATTTGATGTTTTTCAAGAAAT 
TTTATATCTACTTTACTGGCATCGCCTTTTGCATATATTGGATAATTTAAA 
ACCGCATATAAAAAGTCAGCTGTAGTAGTAAAACCTTCGATGACCATTTCA 

40 TCTAACGCAACTTTTAATTTATTAATAGCAGTTTGTCTATTAGAATCCTTT 
ACAATCACTTTAGCTACAAGTGAATCATAATAAGGAGAAACCTGATAACCT 
GTGTAAAGTAAAGAATCTACACGTATATTAAATCCTTGTGGTAAGTGTAAT 
TTATTAACTTTTCCTGGAGAGAATTAAATGCTAAACCAGTTCAAGCTGGTG 
CTTACGATATTCATTTCGTAGACAATGGATACCAATACAACTTCACTTCAA 

45 ATGGTTCTGAATGGTCATGGAGCTACGCTGTAGCTGGTTCAGATGCTGATT 
ACACAGAATCATCATCAAACCAAGAAGTAAGTGCAAATACACAATCTAGTA 
ACACAAATGTACAAGCTGTTTCAGCTCCAACTTCTTCAGAAAGTCGTAGCT 
ACAGCACATCAACTACTTCATACTCAGCACCAAGCCATAACTACAGCTCTC 
ACAGTAGTTCAGTAAGATTATCAAATGGTAATACTGCTGGTTCTGTAGGTT 

50 CATATGCTGCTGCTCAAATGGCTGCACGTACTGGTGTATCTGCTTCAACAT 
GGGAACACATCATTGCTAGAGAATCAAATGGTCAATTACATGCACGTAATG 
CTTCAGGTGCTGCTGGATTATTCCAAACTATGCCAGGTTGGGGTTCAACTG 
GTTCAGTAAATGATCAAATCAATGCCGCTTATAAAGCATATAAAGCACAAG 
GTTTATCTGCTTGGGGTATGTAATCCAAATCTAATAATATAATTAGAAAGA 

55 GTGAGACAGCATTCGTTGTTCTCTCAGCAACGAGGACTAAAATTGACTCCT 
TGTATATAAGTAAGTTTTAATTTTAGTTATCGATGCAATACATTAAAAAGC 
CTAGGACACATTGCTTTGTGTTCTAGGCTCTTTTTTATTTGAAAGTCTAAT 
TAAAATGATTTATATAACAAAGTGATGTTAACAAAGTGAAAATATAACATC 
TAGACTGAAAAAATATAATAAAATTAAATGATATTATGTTTGTATTTATTT 
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AAAAGATATGGTTATATAAATGGTGTGTTTAATATCATATATAGATACATT 
TAAATTGCTTGCCTACACAACTTATATGAAGATGCTCTAATAAATGTATAC 
CTATTTAATTTGTTAAAGTATAATATTAAAGTAAAAAGATGACATGTCAAT 
GCAATTGAGGATGTAATTGTGATAATATTAAGA 

5 

Sequence 343 0 

step . 1003a06 . cons . ok 

GTATTATATTCAATATTGACGACATAGTCATCTTGATCTTCTTCAATAATA 

10 AATGACAAATCAAATTTAGCTGTTGTTGACTGAGGTGGAATATGTGTCAAT 
TGACTATGTCCAAAATTCGCATGATTTGTTTCATTATTTTGAAGTACGAGC 
ATCACATCAAATAACGGATTATGTGAAGCATCTCTTTCATCAACAAGATCA 
TTGACTAAGCTTTCAAAAGGATATTCTTGATGTTCATATGCCCCTAGACAC 
ATCTCTTTCATCTCAGCCATCAATTGATCCCATGTCTTTTGATCATGTGGT 

1 5 CGACCACGATATAC AAGTGTATTAGCAAACATACCTAACATATTTTCAGTA 
TCGCGATGAGTACGCGCACTGATTACACTACCAATAGCGATATCGTCCTGA 
CGTGTATATTTGTGCAATAATACCATGATTGCACTAGCAAAGAACATAAAG 
TCTGTCACTTGATGTTGTTCTACATAAGATTTCAATTGCTGTTTGATTTGA 
CGATTGTAATGAAACGTCAACATATTACCGTTGGTTGTTTTAATACTTGGT 

20 CTAGGATAATCCGTAGGCATATTTAATATTGGAACCTGATTTTCAAATTGC 
TGTAACCAAAAGTGACGTTGTTTAGATAAGTCTCTGTGCACCATCCACTCA 
CTATAATCTTTATACTGAAGCTTAAGTTCAGGTAATGATTTATCTTGGTAT 
AAAGCGTTCAAATCAGATAGTAAAATCGTGTTACTCATACCATCATTAATA 
CTATGATGAGTATCCATAAATAAATAATCTTGTTGTGGTCCATGTATATAT 

25 TTAACTCGCATCTGACTTGGTTGTTCTAAATCAAACGGTTCCATAAATGAT 
TGAATAATATCTTGCTCGTTCGTTAGAGATGTCGTTACCTCTTCAAAATCA 
GGCGAAACATGTGTCGCAATACGTTGTTTAACTTCATTGTCATCAATTACA 
TATTGTGTTCGTAATATTTCATGACGTTCAATCAACTTAGATAATGCACGT 
TGCAATTGCATAACATTAAGTTCAGAAGATAATCTCCATAAGAATGGAATG 

30 TTATACACTGTGTCTTTAGGATTGACCTTCCATAAAAGATACATACTTTTT 
TGAGACGCACTTAAATCATATTGATACGATTCATTTGCTTTGGGAATCACT 
TCATAGACATCATTTTGCAGTTCTTCAATTTGTTGTCCAAGTTGCTCTACA 
GTAGGCGATTTCATTAAATCACCTACTTTAAGACGTTTTTTTAACCTTTCT 
TCAATACGGTTTACAACTTyVTGTTGCTCTAAGAGAGTGTCCACCTAGTTCA 

35 AAGAAATTATCTTTAACACCTACCTGATCAACATGTAAAATCTCTTCGAAA 
ATACGGCAAACTGTGCGTTCAATATCGTTACGTGGTTCTACATAATTTCTA 
TTATTCTTTAGATTAATTTCAGGTAATGCACGCACATCTAATTTCCCATTC 
ATCGTGATAGGTATACGATCCACCTTCATAAAATGCACAGGTATCATATAT 
TCAGGTAATGTTTCACTTAAAATATCTTTTAATTGACCTGTTGATTTTAAT 

40 TGCGATGCTTCATAATATGCCACTATTTGTTTATCTTGGTCTTGCTCTCGA 
ACGATGACTACAGCTTTATTAATATCACGTATAGCTTCTAATGCTTTTTCA 
ATTTCTGATAATTCTATTCTAAAACCGCGTATTTTAACTTGCTTATCGATA 
CGACTAATATAATCAATATAGCCATCTTCTTGAAGACGAACTAAATCACCG 
CTTCGATAAAGCATTTCATTGTTAAAAGGTGACTGAATAAAACGTTCAGCA 

45 GTAAGTTTAGGTTGATTTAAATAACCTTTTGCTAAACCTGCACCACCAATG 
CACAATTCACCTGGAACACCTACGCCACAAATACGATTACCTTGCATGACA 
TAAACTGTCGTTCCACTAATAGGTAAACCAATAGGTATACGTGAAGGCATC 
TCTTGTGGAATCGCAAAAGTTGTAGTAAATGTTGTATTCTCTGTCGGTCCA 
TAACCATTGATTATTTGAGGATGACACTCACGCGAATTTAATAAGTGAACC 

50 CATTTAGCATTTAACACTTCCCCACCAATAAGTAAATAAGTTAAAGATTCT 
AGTGCTTCGATACGTTCGCTAGCAATTTGATTAAATAAAGATGACGTTAAC 
CACATCGTGTTGACTTTATTTTCAGTAATAGCTTGATCTAACAATTGAGGA 
TTTAACAACGTATCTTTAGATGTAATGACTAACCGTCCACCATTCAATAAT 
GGACCATATATTTCAAAGGTTGCTGCATCAAAAGCTACTGTTCCTGATAAT 

55 AAGACGGTTGTATTTTCGTTCAATTCGACATAATTTGGATTGTGTACTAAG 
CGATCAATTCCTCTATGTGGCACCAGTGTCCCTTTAGGTTTACCAGTCGTT 
CCTGATGTATAGATGACATAAGCGATATCTTCTGAACAATTAATGCCTCTC 
GGGTTATCAATATCATGTTCTCTTGAATCAACTATCAATTCTATATCCATT 
TGAGGTAAACCTGATTGAAATGATGTACGATATGTTACAACCGCTTTAGGT 
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TTTGCGTCCTCAATAATATAATTCATTCTTTCTTCAGGATAATCCGGATCA 
ATTGGTATGTAGCCTGCACCAGCTTTCAAGATCCCTAACATTCCAATAATC 
ATTTCTAAGCTGCGTTCTGCTATTAATGCCACCATATCATTAGGTTCAATC 
TGATGATTTAAACGCAATCTATATGCTAAAAGATTCGCACATTGATTTAAT 
5 TGATGATATGTCATCGATCGTTGTTCATATTGCAACGCAATACTATCGGGA 
TGTTGGTGCACTTGACGTTCGAATCTTTCAACCACGGTTTTATAAGTGACC 
TCACTATAATTAATGTCATTGTTTTGAAGATTGATGTCATTATATTTAGCT 
AAATCTGAGCGATCACTGAGTTTTATATCTTTAATTAATAATGATGGATTT 
TGTGTAATTTGCATATAAATCCCATTAATTAATTTGACTAACGTATTTACA 

1 0 GTTAATTC ATCATAAACGTTGTCATTATAAACTATATCAAAACCACCCTGC 
ACATGAGGATAAAACTCTAAATCCGCGAATGATGATGCTTCATTGTATAGA 
CGATGTATATGATGACATATATGATTAAGCTCAAATTGTTCTTGTACTTTT 
TCAATATGAATCATCGTTTCTACATCTATTTGTACTGCTTTAGGTTGAACA 
ACAAAAGACGCACCACACATTTGAAGCTCTTCAAGTAGCGCGGAACACTCA 

1 5 TCCACCATATCTTTTACGGCGTCACTTTGACTAATACTTAAATTAAGTGGG 
GCAATGTCTGTGTTTAAAACCATCATATTCTCAGTATTTTTATTATCTATC 
GAAAAATGGATGCCTAACGTCACATCAGAAGACTGACTTATAAAATGATGC 
GCTAAGTAAATACTTACAATCCACGCATCAATGTCTT^GATGGTAGTTCT 
GCATGCAAAGAACACATTTTTTTCTTTTCAGATGTATTCTTTACAGGTATG 

20 TAAGCGTAATTATCACCGTCTGCATTCAATAC 



Sequence 3431 

step . 1003aO7 . cons . ok 

25 AACATTAAGATACTCCCCTTGCTCATTTACACCCAAACTAGTGATAATTGG 
TATTATGTCGCGTGATAACATTTTTTCTAATAGCATCGTGTTGATTTCTTT 
AACTTTACCAACGTAACCATAGATATTTTTATTTAAAAATTCGGCGCTTAC 
AAGTTCTTTTATCTTTTCTTTAAATTGAAAAAATTCTAAATTAGATTCTTT 
AAGTCTTTCTAAAAGTTGATGGCCAACTATGTTAATTAAAGCATCATATAT 

30 GATAGGTAAATCATTTTTAGCAGTTACTCTCATGCCATCAATTTTAATAGT 
TGAATGATTGTTTTTAGTTAATAAATTACTGATGACTTGGCCGCCACCGTG 
AACAATAATTATTTTTTTGTTTTCAAGGTGCCAAGCATTAATTTGTTGTAT 
AAATGCGTCGTTTAAATTTTCTATAGCTATACCACCGAGCTTAATTACGAT 
GATGTTTTTCATATGTTCCTCCTAAGTGGTGTATAGTGCGTTGATTTTAAC 

35 GTAGTCATACGATAAATCACATCCCCATGCTTGACCTTCACAATTCCCTTG 
ATGAAGGTCAAGCTGAATTGATATTTCTTCAGCACTCATTATTTCTTGAAT 
TTCTTCTTTATCGTACTTTACTGGTGAGGATCTTATTAATACAGGTATCCT 
ACCTATAAAAATGTCTACCTGATTAATATCAAAATATGTTTTAGCATAACC 
TGCAGCAGCAATAATTCTACCCCAATTAGGATCTTCGCCAAAAATTGCGGT 

40 TTTTACTAAACTTGAACCCACCACACTTTTAGCAATCATTCTTGCAGCACT 
AGATTCTTTTGCACCTTTAACCGTGACTTCTATTAATTTAGAAGCACCTTC 
GCCATCCCTTGCAATACTTTTTGCTAAATCGGTCATAATATATAGAAGCAT 
CTGCTTAAATTTATAGTAGTCTTCGCTGTCTTTTTTAATTTCGTTATTATT 
TGTACATCCATTTGACATCACAAGCACCATATCATTTGTTGAAGTGTCACC 

45 ATCTACAGTGATTTGATTGAATGTAACTTCAACCACATCTTTTAAAGCCTG 
TTGTAATGTTTGTGATGAGATGTTAGCGTCACAGGTTATAAATGCTAGCAT 
TGTAGCCAAATTAGGATGTATCATTCCTGACCCTTTTGCTACACCTGCCAT 
CGTTACTGTATCGCTACCAAATTCTTCGTTTACAACGCATGTTTTTGTATG 
AGTATCCGTTGTTAATATCGCTTTTGCAAAGTCATCAGCATTACCGTTTTT 

50 AACTAGTTTGGAAAAGCCATTCTTTAGAATAGACATTGGCATCACCTTTCC 
AATAACACCAGTAGATGCGACACCAACATATTCTGGTTGAATTTGTAATTT 
ATTTGCGGCCAGTTGTTGCATTTTAAAAGCATCTTTTTCTCCTTGTTTACC 
AGTACAAGAATTAGCAATACCTGAATTAACAACAATAGCTTGCATTTTACC 
ACTTTTTTCGATGCTGTTTTTTGTTAATTTTAATGGTGCAGCAATGACC^ 

55 ATTAGTTGTAAATACACCAGCTACATTTGCAGGTACTTCTGAAACAATCCA 
ACCAAAGTCTAATTTTTTCTTTTTAAAGCCAGCGTGCAGACCATCAGCTGA 
AAATCCAAGAGGACTTGCAATATTTCCCTTAATTATATTCATAATTTAGTC 
CCCCTATATTAGATATATAAAGGCGCCAAAATTAAACCATCAGTTTCGTCG 
AAATTAAACATCAGATTCATGTTTTGAATTGCTTGTCCAGCGGCACCTTTA 
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ATTAAATTATCAATAACTGATGAAATTGTTAAAACACCCGTGGTTTCGTTA 
TAAACAAAACCAATATCAGTGTAATTTGTTCCTATTACTTCATTTAACTGT 
GGTAATCCATCTTTAATTCTTATGAAAGGCTTATTTTTATATACATCCTTG 
TAGGTAGATTCTATTTGATTTATTTTAACTCCGTTTTCAAGCCGTGTGTAT 
5 ATAGTTGCAACAATTAATTAGAATCTAGTGATTGATTGTCAGATTGTACCT 
TAGCAAGTGCTTTTTCAATCTTATCACTACCTTGTAGTGAACGTGTAATTG 
CTTGTATATCTTGTTTTTCATCTGTGACATGTTGATCTTTTGAATCCCTAA 
TTTGATCAGTATCTGATGCTATACGTTCTAATGTTGAGCGACGACCATCAT 
TACTTTTATCGTCTTTACTATCACTATCGGATTGATTTGTTGTCGTACTAG 

1 0 ATTCTTCTTTGTTATTTTCGTTTTTATTACGATGGTGATGCTTGCTGATCG 
CTTTGTTCAGAAAGCTTATTATTAGAGTCAGACTTAAAATCTTGTTGGATT 
TTATCTAATTTATCAATAACCGACTCTTCAGAATCATCAGTAGTTGCATTG 
CGTGTACTAGCATTTGTATCGTGTACTTTATTATTTGAATCATCTTTTTCA 
TCGGTTGTTGAGTCACTAGGTTTAGAAGCACTATTTTCACCTTGTTGACGG 

1 5 TTATCAACATTTTTAGCAATTGAATCTAAATCCGAC AGTATTTTATCTAAG 
TCTTTAGTCATATCATCATTATCTTTTTCAGATGTATCGTCATCTTCAGTT 
TTTGATGATGTGGCACTAGAGTCTGTCGATTTTAAATGATTATTTTTATCT 
TTTGCATCATTTATAATACGGTCAATACTTGATTGATTATCGACTTTATCT 
AAACGTTCTTCAAAACTTTTGATATCTGCGTCAGAAAGTTCTTTCAAATCT 

20 TTAATATCATCTTTAGCTTGTTGGATTTTGTCAGCGATGGAGTTACTATTA 
GAAGTATTTTCAACATTTGATGATGTACTTTCTGAATCTGCAAAAGCATCT 
TTTGTGAAGCCATAACCAACCAGCGAACCACTAAGTAACAACGCAGAACAT 
GACAGCTTGAGGTAGACACCAATTTTATCTGTTCTCTTCATTTACACATAC 
TCCTTTAATAAAGTTACCTGTATTATTCTTTAAATAGATTTATATGATTTA 

25 ATCAAAAATATACATATCACATTATAATACATAATTTCATAGATTTGTAGA 
TAATGAACCTAAAATAACAAATTTAAAAAACAAATTTCTTGTAATTATTAG 
TAATTTGGTATTACACTTCTTT 



30 Sequence 3432 

step . 1003aO8 . cons . ok 

TGGAT6AACTTGTTGTCCTTTAATAATATGACTTGCTTCAATAAGATCAGA 
TAATCTTGCATTCGTGCATGAACCTAAAAAAACATAACCTAATTTTATATC 
TTCGGCTTTTTGACCTGGGTGAAGTCCCATATAGTCATAAGCACGTTGGTC 

35 ATTTGCATTTTTAATTTCTGGGAATGGATTACTAAAACTAACTCCCATTTC 
TGGGTTAGTTCCCCAAGTTACTTGAGGTTCTAAATTTGTTACATCAAGTTC 
AATAACTTTATCAAAATAGGCATCATCATCAGAATAAAGTTCTTTCCACCA 
CGCCATAGAACTATCAAAATCAGTAGCATAAGGACGACCTTTTACGTAATT 
AAAGGTTGTTTCATCAGGTTGCATTAAACCATACTTTGCTCCTGCTTCAAT 

40 AGCCATATTACAAATCGTCATACGTGCTTCCATGGATAAATTTTTTATAGT 
TTCTCCTGTAAATTCAAGAGCATAACCAGTACCGAAATCTACACCATATTG 
ATTGATTAAATATAAAATAATGTCTTTAGCATAAACTCCAGCTGGTAAAGA 
ACCATTTATATTAATTTTCAAATTTTTAGGTTTTGTTTGCCATAATGTTTG 
TGTGGCAAATACATGTTCAACCTCGCTTGTTCCTATCCCAAATGCGATAGC 

45 ACCAAATGCGCCATGTGTAGCAGTATGTGAATCTCCACATACAATAGTTTT 
TCCTGGTTGAGTTAATCCAGTTTCnX^GTCCAACCATATGCACAATACCTTG 
TTCGTCAGAACCCATATCAAAAATATGTACACCAAAGTCCTTAGCATTTTG 
TTGTAAAGTTGTAATTTGTTTATTAGCAATTTCATCTTTTATATTAAAAAT 
ATCAATTGTGGGAACGTTATGATCTAAAGTTGCAAAGGTTAGATCAGGTCT 

50 TCTGAGTTTACGATTTTGTATTCTAAGTCCTTCAAACGCTTGAGGAGAAGT 
GACTTCATGAATGAGATGTAAATCAATGTATAATAATTGTGGTTCACCTTC 
TTTTCCATGAAGCACATGTTTTTTCCATACTTTATCAAACAGTGTTTGACC 
CATAATTTTCTCCTCCTCTTATAGATATTTTTCTTTAAGCAATTTAAAAAT 
TTCTGAAGTTCGATATTGTCCACCTAAATCTGCAGTTGTCTTATTAGATTG 

55 AATAAACGAGTAAACAATTGACTCAAGTTCGTTAGCAGCATCATTTTGATT 
TAAACTTTCTCTTAAGCAAAGTGCTAAAGATAGAACCATACCAAATGGATT 
CGCTTTATCTTCATTAGCTATATATAAAGAACAATGTGAACTGACCTGTTA 
AAAGTCGTGCAACAAGTCCTCCACCTAAGTTGTGGATAGCTTTTACATTTC 
CATCATTGGTAGCATTGATAGTTGTACCATATGGAGCGCCATAGTCAATAC 
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CAAAGTGAGCACCACCATTAAAACTATAACCAGGAGCACCACCATTAGGAT 
AATACCCCGTAGTAATTGGGAATTTAGTGAACGAAGAACCATCTCCACCGC 

CTGCAGAGTCTAAACCTTCATTAAAGAAGTCTTTAATACCATTTTTAAGTT 
TCTTAATCATAGCTGTCATTAAATCAAATGGTAATGGTGCATTTTTCATAA 
5 AGTCTAGGTTAAAACCAACTTTCTCAAAGACTTTATTTACCAATTTCATAG 
GATGACCTACATAATCAAAAACATCTCCAATACCCTTGATAACAGTGCTAG 
ACACATCTTTAGCTTTATCAAGTGCTTTTGTACCTATGTCTTTTGCTTTGT 
CAAACGTAGTACCACCTATGTCTCTTGCTTTACCACCAATGTCTTTTACTT 
TATTAACATCATTTTTTACACCAGAGCTAATTU^CATCTAATATACCGTCTT 

1 0 TCTTTTT AGTACCTTTACTAAATTTAGGTAGTTTTTTCTTCTTAGTGTC AT 
ATCCTGAATTAGATAAAATAGCATGTGTTTGCGCGCCACTATATACACTTG 
AGCCTTTAGGTAAGAATGTTGTTGTATCTTTGTTAGGTGTTAGTGCCATTC 
TTCCGTTAGGATAACGGATCATTTCATTTCTAAAGCCACCAGGTCCATTAC 
CTCGACCTCTATCTCCTACAGTAGCCATTGTACCTTTATTGATTTTACCGT 

1 5 TGGTAATATAACTTTGTGTATGCGTAGATTGAGTACCTGTAG AT AATTTAA 
TCTTAGGTATTTTATCCATACCTAATTTATCTGCGACCCAGTTAACACCTT 
TGATTAATCCATTAAGGCCTTTTTTAACGCCTTTAACCATTCCACCAAAAA 
GATTCCCAATTTTACCGGTAACTGTTTTGATACCGTCTCTCATTTTATTCA 
TGGTGCCCATAACTTTCGATTTCATGCCATTTACTATAGATACAGTAGTAC 

20 TTTTAATTCCATTCCATTTCTTGCTCATGAAGCCACCTACAGCATTCATGG 
TGTTATGAGTACCTTTTTTAAGTGATCCCCAAGCACCTTTGACGCCTGACC 
ATAGAGCTTTCGCTTTATTAACAGTACCTTTTTTAATACTGTTCCATTTAG 
AACTCATGAAACTGCCAACTGCTTTAAATATGCCAATTGTACCTTTTTTGA 
GTGCATTCCATGTATTTTTTU^CTCCAGACCATAATGCTTTTGCTTTATTCA 

25 CTACTGATTTTTTTATAGCCGTCCAGATTTTAACTGCGGCATTCTTTACAG 
CATTAAATATTACAACAATACCTTTTTTTAATGCGTTAAATACAGATAGAA 
CACCTTTGCGCAAAGCTCGAACGATTCCTAACACACCATTTTTTAATGCAG 
TCCAAACTTTAATAGAGAAACTCTTAATAGCATTAAATATCGTAACTACTA 
TGCGCTTAATAAGGTTGATATTAAATTTTACTTGCGCAACATATGCTTTAA 

30 TAAACTAAGTAAATTAGAAGTACAAGATCGAACACAAGCAGTAATATATGC 
GTTCCAGCATAATTTAATTCAATAAATTATTAAAGGCGAAAGTAAAGATAC 
ATCTATCGATACTTTCGCCTTTCAATATTATCAGTC6CTTGTGTGATGTTC 
TTC 

35 

Sequence 3433 

step. 1003all .cons .ok 

TTTAATTGTTCTACTTCTGACAGCTCTTCTAATCCTTTACCGTAGGTATAT 
TGTTTACCAACTTGTTGTGAAAATCTATAACTTTCCCCATTTCGATACCAT 

40 CGCCACCAAGTTTTCACTTGCGTCCTATTTCTAATATTTAATTCTTTCATA 
ATTTCTTTTGTTGAAAATCCTGCTGCTTTCATTTCAACTGCTTTATACTTT 
GTTTCTACTGAATAAGAA6CTCTTTTCATAGAAAAAAACCTCCGTATGATT 
CATTTTAATATGAATTCAACGAAAGTGTTTTTATATAATTCCCACAAATTG 
GGGTCAGTCTAATAATGAATTTCGTGATTGTAGGCTTTTTATTTTATATAT 

45 TCATTAATTTCTCTTCTTGACGGATACGTACAATTAAAAAATAAGCATATG 
GAACTAATAAGAGTGTTGTGTATGTAGCATTTGTTAGTAGTAATACACCAA 
TTAATTCGGGAATGATATTTAAAAAATAGTTTGGATGTTTCGTTACTTTAT 
ATAACCCTGACTTAATAATAGGGTGATTGGGCAGTATAAATAATTTCAATG 
TCCAAATACGACCTAATGTCTTAATAACTATAAATAGCATGATATAAGCGA 

50 TGATCAATATGATTAAGCCAATGCCGTTAAGTAGACTAAATGTATCCTTAC 
GAATGAATGCTTCTATAGCTGCACTCATGTAAATTAATACATGCGTAATGG 
CTAGATATTTTGAATTTTTCACACCATATTCCACCGCGCCCTCTACCTTTA 
GCTGTTTTGCGTGTTGCATAGATATCTTTAAGCTGATGAGTCGAATACAGA 
AAAAGATAAATAAAATAGTTAAAATCATGATGTCCTCCATTGTTTGTTATA 

55 TATATATTCTTTTGCTCGTACAATACAAAGTATTTTATATAGAAAATGTTT 
CAAATGCCATTAACTATTTATTAACTTTAGCACCAACTCATAACTCTTCTA 
ACACCTCACATATTAAGTTCAGTTTTTCGGATAATTTAATACTTTTAAGGA 
TATTAAGCGCTTACATTGAT6TGATATATTTTTTTTAACGAAGATGATAGG 
GGTGTTAGGGATGAACTTTTTTGATATTCATAAAATGCCAAACAAAGGGAT 
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ACCATTAGCTGTACAACGCAAATTATGGCTCAGAAACTTTATGCAAGCGTT 
TTTTGTCGTATTCTTTGTTTACATGGCGATGTATTTAATTCGAAACAATTT 
TAAAGCGGCACAACCGTTATTAAAAGAAGAAATCGGATTAACAACATTAGA 
ACTAGGTTATATAGGATTAGCGTTTAGTATTACTTACGGTTTAGGAAAAAC 
5 AATACTCGGTTATTTCGTTGATGGGCGTAATACGAAACGTATTATTTCCTT 
CTTATTAATATTATCTGCGATTACAGTACTTATTATGGGATTTGTATTAAG 
TTATTTCGGOTCTGTGATGGGGCTATTAATTGTATTGTGGGGGCTTAACGG 
TATATTTCAATCTGTGGGTGGGCCTGCAAGTTACTCAACGATTTCAAGGTG 
GGCGCCTCGAACAAAGCGCGGTCGTTATTTAGGCTTTTGGAATACATCACA 

10 TAACATTGGTGGTGCTATTGCTGGTGGTGTCGCACTTTGGGGCGCGAATAC 
ATTTTTCCACGGTAATGTGGTTGGAATGTTTATTTTTCCTTCCGTCATCGC 
TTTAATCATTGGGATTGTGACATTATTTATTGGTAAAGATGATCCAGAGGA 
ATTAGGTTGGAATCGTGCCGAAGAAATTTGGGAAGAGCCTATCGACCAAGA 
AAACATTGATTCTCAAGGTATGACTAAATGGGATATCTTTAAAAAATATAT 

1 5 CCTTGGAAATCCTGTG ATTTGG ATTTTGTGTATCTCTAATGTTTTTGTATA 
TATCGTGCGTATTGGTATTGATAACTGGGCACCGCTATACGTATCAGAGCA 
TTTACATTTTAATAAAGGTGATGCGGTGAATACTATTTTTTACTTTGAAAT 
AGGTGCATTAGTAGCTAGTTTATTGTGGGGCTATATCTCAGATTTATTAAA 
AGGTCGTCGTGCGATTGTAGCGATTGGATGTATGTTTATGATCACCTTTGT 

20 TGTACTCTTTTATACCAATGCAACAAGCGTGACAATGGTCAATATTTCTCT 
ATTTGCATTAGGCGCTTTAATCTTCGGTCCACAGTTACTCATTGGTGTATC 
TCTGACTGGCTTTGTTCCTAAAAATGCAATTAGTGTCGCTAACGGTATGAC 
AGGTTCATTTGCATATCTATTCGGGGATTCAATGGCTAAAGTGGGTCTGGC 
TGCAATCGCTGATCCAACACGTAATGGTTTAAATATTTTTGGGTATACGTT 

25 GAGTGGTTGGACAGATGTCTTTATTGTATTCTATGTAGCTTTATTCTTAGG 
AATGATATTATTAGCCATTGTTGCTTATTACGAAGAAAAGAAAATTAGAAA 
ATTAAAAATTTAAGATTAAGTGAATTTAAATATATACTCCCTACTATAAAT 
TTTATCATTATGGTAGGGAGCTTTTTCGTGATTGATTTGCACTATTTTGAT 
TGCTTATTATAATCACTTGGTGACATATGTAAATATTTTTTAAAATGATAG 

30 CAAAACATTTTATACTCAGTUU^CCTACTTTTTCAGCAATTTCATAATGC 
TTGTAGTGCTGGTCTAAAAGATGTAATGATTTTAAAATACGATAACGATTT 
AGATAATCAACTATCGTAATGCCTACATGTTCTTTAAACGTCCTCATGGCG 
TATGACTCACTTACATCAATAGGATTAATTAAGTCAAGAACAGTCACTTTC 
TTGTGATAATGTTGCTTGATTTGAGACAAAATTTGATTGACATAATAGTCA 

35 TCGTAATCAATTTTTAATAATGGTTGAAAGGCAGTATGATATGCCGCGTCA 
TCATTGGTAGAATGTGGGCGTTCTAATAACCTTTGAACTAATATGTCTAGA 
ATATGCTCTAACTGAGTGTGGTCTACTGGCTTTAGTAAATAATCAAGAACA 
TGATGTTGTATACCGGCTTTCATATATTCGAAGTCGTCATAACTCGATAAA 
ATGATAATCTGGCAATCAAGGTCTTTGATGTCATCTAGTAGGTCAACACCA 

40 TTTTTTCGAGGCATTCGTATATCAGTAATGACGAGTTCAGGTTGATGTTGG 
CGAATTAAAGACAATGCTTCCACGCCGTCTTTGGCAGTATAAACAGTGGTG 
AAATGATAGTCCTCCCATGGAACCATTTGCTTTAAGCCTTCTCTTATAATC 
CTTTCATCATCACAAATAACTACTTTAAACATCTTTATTCCTCCTAGACAA 
GT^TATTTGGTAACACATTAATGTCCCTTGCTGGCTTCTTGAGAAAATGT 

45 GCAGACGTGCATATGTTCCATATTGAATCATGGCTCTATTATGTAAATGAT 
TTAAACCTAGGTGTGTCGTATCAAAAACATCGTGATGAAGGGATTGGCGCA 
CGTGTTCTAAATGTGATGGAGACATACCGATGCCATTATCATGAACCAGAA 
TATGTAATTTGCGCTTCGTAAGTCTGATACGAATTGTTATCTTTAAAGGTT 
CACTATCACGACCATGCTTGATGGCATTTTCTACGAGTGGTTGAAGCATCA 

50 TCTTACCAATTGTTTGATGTTGTACACCCTCAGTAGCATCGATGTAAAGCT 
GTATCATATCATCGAAGCGGATATTTTGTATGGCAACATACTGCTCAATGT 
AGCTTAATTCTTCTGCTAATTTGACTGTGTGCGATGCTGTGCGTAGAGAAT 
AACGTAGCATTTGTGATAGTTGTTGTATGACTGTTTGTGCTACTTTGGGTG 
AAAGAGGAAT 

55 

Sequence 3434 

step . 1003bO4 . cons . ok 

GCGCGGCAATTGAACAAAAGAA/yiAAGGTATTGAAACATTAGTAATTGAAA 
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AAGGTAATGTTGTTGAATCAATCTATAATTATCCAACACATCAGACTTTTT 
TCTCATCAAGTGATAAATTAAGTATCGGCGATATTCCTTTTATTGTTGAAG 
ATAGTAAGCCAAGACGTAATCAAGCGCTTGTATATTATAGGGAAGTCGTTA 
AACATCATCAACTTAACATACATCCATTCGAAGAAGTTTTAACAGTTAAAA 
5 AAATAAACAATAAATTTGCAATTACAACTACAAAAGGTGTATATGAATGTA 
AATATTTAACTGTTGCTACGGGTTATTATGGTCAACATAACACTTTAGAAG 
CGGAAGGGGCAGAATTACCAAAAGTATTCCATTACTTTAAAGAAGCACATC 
CGTATTTTAATCAAAATGTTGTTATTATTGGAGGCAAAAACTCTGCTGTTG 
ATGCTGCCTTAGAATTAGAAAAAGCTGGTGCTAATGTAACTGTTTTATATC 

10 GTGGCGAACAGTACCCTAAAGCAATTAAACCATGGATATTACCCAATTTCG 
AATCATTAGTCAATCACGAAAAAATTACGATGGAATTTAATGCGACAGTAA 
CCAAAATTACCGATCATTCAGTGACTTATGAAAAAGATGGTCAACTTATAG 
AAATTGATAATGACTACGTTTTTGCTATGATTGGTTATCATCCAGATTACG 
ATTTCTTAAAAACAATAGGTATTGATATCCATACCAATGAATATGGAACTG 

1 5 CTCCTGTTTATAATCG AG AAAC ATTCGAAAC AAACGTCGAAAATTGTTATA 
TAGCTGGTGTTATTGCTGCGGGTAATGATGCAAATACTATTTTTATCGAAA 
ATGGTAAATATCATGGTGGTGTCATTACACAAAGCATTTTGACAAAAAAAC 
AAACACCTCTTGAAACATAGAGCTTACAAAAAGCTTGGGATACTAAAC7VAT 
GTCCCAAGCTTTCTTAATGTATAGTCAATCAATGACTCAATTAAAATGCAC 

20 ATATTAATAATAGGTAATCCTTAATTATAGTATAATTGCACTAAAGAAGTT 
TCATAGACAATGCATACAACTAATAGAAATCGGAGTGTGACAATGAACTTG 
ATACGAATTTTGTCCCACTCTTTTTAATTACTCTTCGAAATATCGCTCTAA 
TTGATTTTGAGTCATGTCTTGACTCAAACCAACTAGTAATTTTAGCCTTGC 
CTTTGGTCCATTTAAACCGTTCGAAAAAATAACACCATTATTTTTCAAATC 

25 TGCGCCACCACCTTCATAAGCATATACAGGACTAACAATACCATTGAATGA 
TCTAGATACGAGAACTAGAGGAATGTTTTTCTTTAGACATTGCTGTAGTCC 
ATTAAGACAACTTTTTGGAAGGTTACCTTGTCCTAGTGCTTCGATGACTAT 
ACCATCAACGTGTTGTTGTGAATAAAATGATAGTACATCATCTTCCATACC 
CATGTATGCTTTTACAAGTGGAACACGTAAATTTACATCGATATATTGGTA 

30 GGTAGTTTGTCTGTAAGGATGATGATAAAATTGTACTCGATTCTTGGTAAG 
TACACCTAGAGGCCCCTGATTAGGACTTTGAAATGTATTAATATTCGAAGT 
ATGTGTCTTTGTCACATTACGAGCAGTGTGAATCTCATCATTAAATACGAC 
CATAACACCTTTATGATTAGCCTCAGATGAAGAAGCAACCCTTATAGCAGA 
AATAAAATTATAGAGACCATCGGAACCAATTTCATTGGATGATCTCATTGC 

35 TCCAGTAATTACTATAGGCTCTTGAATATCAATTAATAAATCTATTAAAAA 
AGCTGTTTCCTCAAGTGTATCTGTTCCATGAGTAATGACAAATCCATCATA 
TATATTTTCTTTAGAATATGTAATGATTTCGTCTCTTAATCGCACAACATT 
CGAAATTGTCATATGCGGCGAGGGTATATTTAAAAGATTGATTTCGTCAAC 
CTCTGCATATTGACTAATGATATTTTGATGTTGTGATATTGGATTTTCTTC 

40 ATTCGTTATCACTTTATTAGTTTGATCTTGTGACATACTTATTGTGCCACC 
AGTATGTATGATAAGTAGACGTTTCATTACGAATTCCTCCTATTTATTCTT 
TACTTTTTTATGATAAAATAATATCTATAAAACAACAAGGAAGGTTATCAA 
ATAATGAGTTCAATTAATATTGCACTAGATGGCCCAGCTGCTGCAGGTAAG 
AGTACAATTGCTAAACGTGTAGCCAGTCGTCTATCAATGATATATGTTGAT 

45 ACAGGAGCAATGTATCGTGCCATTACATATAAATATTTACAAAATGGCAAA 
CCCGAAT^TTTTGATTATCTGATTAATAACACTAAACTTGAGCTTACTTAT 
GATG7»lAGTAAAAGGGCAAAGAATCTTACTAGATAATCAAGACGTCACTGAT 
TATTTAAGAGAAAATGATGTAACACATCACGTATCTTATGTTGCATCTAAA 
GAACCAGTGCGTTCATTTGCAGTGAAAATACAAAAAGAATTAGCTGCTAAA 

50 AAAGGTATCGTTATGGATGGCCGAGATATTGGTACAGTTGTATTACCAGAT 
GCCGAATTAAAAGTTTATATGATTGCATCTGTTGCTGAACGTGCTGAACGT 
CGACAAAAAGAGAATGAGCAACGTGGCATTGAATCAAATTTAGAACAATTA 
AAGGAGGAAATTGAAGCACGAGATCATTATGATATGAATCGTGAAATTTCG 
CCATTACAAAAAGCCGAAGATGCTATTACACTTGATACAACTGGCAAATCT 

55 ATAGAAGAGGTAACAAATGAAATATTATCTCTACTTTAAATGTTAAAATTA 
AACTTTTTATATTGTTTATTGTAAACATTTAAGCGGAGTAGAATTAATAAT 
CAATGACTCGAAAATATTGAATATATTAGTATTGATTTATAGAAAAGCCAC 
GGACGCTCTGTCTCTGGCTTTTATTTTTGACCATTGAGAACTATGTAAATA 
AACTTATTTTTAACCAAGTCCCCTCTATAGAGAAGTAATACTATGTTTATT 
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ACCAATTTTAATATATCTTTTTCATTTTTATTCCTCATATTATTATGAGCT 
TAGGGTTAATTGCCATAAAATTTATGTTACAATAAAAATAACAATGTTATT 
AAATATACTACAAACTAAATTGACCATCACTTAACTAGGTGCTCTGTTAAA 
CTTTTGTTATAAATTAAAGATAGAAGCAACCTATTCATATAAAGTGGTATC 
5 TCTTGACAATTTTACTCGTTTATAAGATGTTATAATTATGTAGTGTATAAG 
GAGGCATACAAGATGACTGAAGAATTCAATGAATCAATGATTAATGATATT 
AAAGAAGGTGACAAAGTCACTGTTGAAGTTCAACAAGTAGAGGATAAACAA 
GTTGTTGTGCATATTAATGGTGGCAAATTTAATGGAATTATTCCTATTAGC 
CAGCTTTCAACACATCATATCGAAAACCCTAGTGAAGTTGTAAAAGTCGGT 

1 0 GATG AAGTCG AAGC ATATGTC ACTAAAATCGAGTTCGACG AAGAAAATGAT 
ACTGGGGCATACATTTTATCAAAAAGACAACTTGAAACTGAAAAATCTTAT 
GAATATTTACAAGAAAAACTAGATAACGATGAAGTGATTGAAGCTGAAGTT 
ACTGAAGTAGTTAAAGGTGGTTTAGTCGTTGACGTTGGTCAAAGAGGGTTT 
GTACCTGCTTCTCTAATTTCAACTGATTTCATTGAAGATTTTTCTGTATTC 

1 5 GATGGTCAAACAATCCGTATTAAAGTGGAAGAACTTGATCCTGAAAACAAT 
AGAGTCATTTTAAGCCGTAAAGCTGTGGAACAGTTAGAAAACGACGCTAAA 
AAAGCTTCAATATTAGATTCTTTAAATGAAGGCGATGTTATTGATGGTAAA 
GTTGCTCGATTAACTAACTTTGGTGCTTTCATTGATATTGGTGGCGTAGAT 
GGTTTAGTTCACGTTTCTGAATTATCTCATGAACATGTTCAA 

20 

Sequence 3435 

step . 1003b05 . cons . ok 

ATCTACTATTTTCTCAACATCTTCTTTTGTATGAATGCCACTCTCTGAAAT 

25 GTAGCAGCAATTAGACTTTTTAAACTTAAGTAATTTATTTGTATGTAGAAC 
ATCGGTTTCAAATCGTTTTAAATCACGATTATTAACACCAATAATTTTAGG 
GTTAATTTGGTGTGCACGTTCAAGTTCTCTAATTGTATGAACTTCTACTAG 
AGCTTCTAAATTATGGTTTGTTGCATATGAATACAATTCTTTTAATTGGTC 
ATCACTTAAAATATTTACTATTAATAAAATAATAGATGCACCAGCTCGTTT 

30 TGCAACATCTATTTGAATTTTATCAATAATAAAATCTTTACATAAAACTGG 
TAACGATGTTATCTTTGATAACTGATTTAATC6TTCAAAACTACCGCCAAA 
GTATTTTTCATCAGTTAATATTGAAATAGCATTAGCACCATATTTTTGATA 
ATCTTTAACTTGTTGAACAAGATCACGTTGCG6TAATTGAGGTACAGATGG 
GCTTTTCGATTTTATTTCAGCAATAACTGATAATGTTCTATCATAGTTAAG 

35 TGAATCAATCAGCTTTCTCCTCTTAACATTTCCGTTATCTTGTAAAATTTC 
AAGTTTTTTATCATAGTATTTACGCTCAAGCAAAGTTTTTTTATACTCAAT 
AATTTCATTTAAAATAGTCATTACTTAACCTCCCATTTTTAAATATTGTTT 
CATTGCCATACCTGTATCAATGAGATATTTCGCTCTCTCTACGCCATGTTT 
GATACTTTCCACTTGCTCAGCAACATATAAAGCAATTCCAGCATTTAACAA 

40 AACTACATCTCGTTTACTTGAGTGATCCGTGCCACTTAGGATATTCAATGC 
AATTTGTTTATTTGTTTGAGGTGAACCACCTATCAACGTGTCATTATTTGC 
ATAAGCTAAACCGACTTCTTCTGCTTTTAAACTATATTTTTTTAATGCTCT 
TTCGCTGCTAACTTCATAAATGATATTTTCACCAGAAAGCGTGGCCTCATC 
CATCCCATTTGCACCATGAATTAAAATTGCTCGTTTTCTACCTA/^TCCTT 

45 TAATGTTTGTGCTATATTTTCAAGTTGTGAAGCTTCATATACCCCCATCAC 
TTGATAAGTTAATTTCAAAGGATTAATTAATGGTCCAATCAAGTTAAAAAT 
TGTAGGTGTTGCAATCGATTTTCTAATTGATTGAAGCTTTTTCATCATTGG 
ATAAGT^TCAGTTGCACTTATGAATGCTAATCCTTTCAAATTTAATTGTTG 
CTCTACTTCGTTCATTTTGTTTGTTTTTATATTCATTTCATGTAATACATC 

50 TGTACTTCCTGAATGTGAAGTAATACTTTTATTACCGTGTTTAATGACTGG 
CACTCCTGCACTTGCTACAACAAAAGCTACAGTTGTAGAAATATTA/yVGCT 
ATTTGATTGATCTCCACCTGTGCCACAAACACACATAGCTTTATTATAAAA 
TGGTTGGTTTGGATAGTTTGTCTGGATAAAATATTCAACTAAATACGTTAG 
CTCATATTGACCCATGTCTTTATTTGTATAAGCTTTCAATAATTCAACCTT 

55 TACATTGGTTTCTATATTTGAATCAAACAGTGTAACAATAAATGATTGCAT 
ATCTTTTTTAGATAAAGATTTATTTTGTTTAATTTTCTCAAGAAGGGTCAT 
TTTCAACTTCCTCTCGTTGCTATATTTATAAAATTAATAATGATTTGTTTC 
CCATTTTCTGTAGCGAATGACTCTGGATGAAACTGTATGCCATAATGTTTT 
CTGTTTTTGTGTTGGAAAGATTGTATACTATCATATGTTTGGCCAGTGATT 
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AATAAACTTTCTGGAAATGAGCATGGATCACTAATTAATGAATGATAACGC 
ATAACTTTGAAACTTGACTTAATTCCTTTGTATAGTGGTGACTGATTAACA 
ATAGTTAAAGTATCTACTTTTCCATGTTTGACACAACTACCTTGTATAACA 
TCGCCACCGTAATAACAAGTGAGCGCTTGTGCACCTAAGCAAACACCTAAA 
5 ATAGGTAAATTTTCGAAATTTTCAATTAATtTAATTAATAATTGATTGTCT 
AATGGATGTCCAGGTCCTGGTGAAATAATAACTGCTGTAATACCTTTCAGA 
TTACATATATCTGTATCATCGGGATATTTAACAATCGTTTCACATTGTTTT 
TCTACAATATCTACAAGATTGTAGGTAAAAGAATCATAGTTATCTATTATT 
AATATCATGGAGTTACCTCCAATAAACTTTTAGCTTTAAGTTTTGTTTCTT 

10 CAAGTTCTTTCTCTGGAATAGAATCATATACTACTCCACATCCTGCCTCGA 
CACTGACTTTTTCCTCATCGATAATCATGGTACGTATAGCCAATGCAAAAT 
CTAAATGATGATTACAGTTGATATACCCAACACCACCGCTATAGATACCTC 
TTTTATAAGGATAAGATTCGTATATTCTCTGTATAGCTCTAAGTTTAGGTG 
CACCTGAGACAGTACCCGTTGGTAGCAAACTTGCGATGACGCTCATAGGAG 

1 5 ATAG ATGGGGTTTTAATTCTCCAATAACTTC ACTAACGATATGC ATGACAT 
GTTCATAACGTTCTATTGTCATTAGTTTGGTAATTTGTGAAGTGCCTGTTT 
TACTTATTCGATGAATATCATTTCTTCCTAAATCTACGAGCATACGATGTT 
CACTCAATTCCTTTTCATCTTTCATTAATGTCTTTTCATTATTTTCATCTT 
CTTTTTTATTTTGACCTCTTTTAATTGTTCCAGCTATAGGATTCGTATAAA 

20 CTTTTCCATCTTTTACCTTTACAAAACTTTCAGGAGAACTTCCTATTACAA 
TCGGTACATCTTTATTAATATAATACATATATGGACTAGGATTTTGTCGCT 
TTAAATTTTGATATAACTGAAAAGTTAATTGATGTAAATTGTGTTGAAAAT 
GGTGTTTATAACTATAAATTCTTGAAGGAACTACTTGAAACATATCTCCTT 
CAGTAATTTTCTTTTTTAAAATTCTAATAGTTTGAACAAATTGTTGCTCTG 

25 ATATATTGGTGGTTATATGTCGAGGGAT 
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30 ACAATTATATTTGATATAAGTATGGTTGTAATTGAAAATAAAAAATAGGAC 
AGAATTCATGTTGAATCGACTGTCCCACTCTTGTGAATGATATTACTTTTA 
GTGATTCTTTGACATAGCTATCTTATTAGACAATATTAAGTTTATTGACAA 
TTATCATCAAAAGCATCTTTTAAATCCATAGCTATATCATTGATGTCGCGC 
CCTTCAATATGTTCGCGAGGTATAAAATGAACTAAATGTTGACCTTTAAAT 

35 AATGCGTATGAGGGACTTGAAGGAACTTGTTGGATGTAATCTCTCATTGTT 
TGTGTTGCTTCTTTATCTTGACCAGCAAATACAGTTACTTTATGATCTGGT 
TTCACTTCATTTTGCTCTGCAACTGCAACAGCTGCTGGACGTGCTAATCCT 
GCAGCGCAACCGCATGTTGAGTTGATTACAACAAATGTTGTATCATCATTA 
TCTATATTTTGCATATACTGATTGACGTCATCACTCGTTTCAAGACTTGTG 

40 AATCCATTGTCTGTTAATTCA6CTCTCATTTGTTGTGCAAGTTCTTTCATA 
TAAGCTTCGTATCCATTCATCATAAAATGCTCCTCTACTAATAATTGTTTC 
TAATTGTACTATATAGAAAATTAATTGCTTTTAGCGAACGTGTTTTCTATT 
TGTGGTAATAATTCATTCCAGTTAGCGTTATCTTCTTTATCAATTGATATA 
AAATCCAAAACATAGAATATTGATTTAACACCTTCTATTTCAAATAATCGA 

45 TTAATAAATTCAGGTTGACCTTCTTGAGCGGCAGTATATGTTGTAGATGAG 
TTATCTTGACGTGGTTCACTCAACGAAACTTTCATTGTATTATGATTAGGT 
GTTTCTGAAATTCCTATAATCTCCATGTGACATCCTCCTATTGTAACTCAT 
TACCAATTTGTTTAATTTGTTTACCGACCGTACATTCATTTATACAAAAAT 
GATGTGCTTGTGTTTTACCGTGAGTTTTTCTATTGTGCTTTTTTAATGGAC 

50 ATTGGTGACAATAAGTATTCATCAATTGGTCGATGTGTTGAATTGCTTGTT 
GTTCTGTATTAGTGAGTATAGCTTTCACTTCCCTTAGCTATAGACTATTTA 
TATTTATACCCATTTCACATGTTTGTATTATAGCATTTATAAGTTGGGAAT 
TTGAAATAATCAATTTTAGTTTTGAGAAAACTGATAACTTAGACATTTGTT 
TAAGTTATTATTTATTTACTCTTCTTTTTTAGCATTGCATTAAAGATGAAT 

55 TTTAAGTAAAATAGTATAGTTAATTTTGGCGGTTTCTAAACACCCGCTTTA 
ACAAAATTTAGGAGGAAAGTTATATGTATAATGAAATATTTGGTATTGCGT 
CATTTATTGTTACATTCGCTTTAATGGTACTGATGTATCGCTGTTTTGGTA 
AACAAGGACTAATTGCTTGGGTAGCAATAGGAACGATTATCGCTAATATAC 
AGGTCATAAAAGCGGTTCATATTTTTGGTATTACGGCTACACTTGGAAATG 
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TCATGTTTGCTTCTATATATTTAGCTACTGATATATTAAATGACATCTATG 
GTCGTAAAGTTGCTAAAAGAGCGGTGTGGCTTGGTTTCTCTTCTACCTTAG 
TAATGATTATAGTCATGCAAATGTCATTGCATTTTATTCCTGCCCCAGTAG 
ACAATGCGCAAAACTCATTAAAAATGATTTTTGATTTAGTGCCTAGAATTG 
5 CTATAGGTTCCATTATTGCTTATATCATAGGCCAACATATTGATGTATTTA 
TATTCAGTATGATTAAAAAGATATTTAGCTCTGATAAGACCTTTTTTATTA 
GAGCATATGGTAGTACCATTTTAAGTTCTATCATTGATACCGGTTTATTTG 
TTTCAATTGCTTTTATTGGTACTATGCCTGGTACTGCTGTTTTTGAAATAT 
TTATTACCACTTACTTGTTAAAACTAGTGTCAACTATTTTTAATGTACCAT 

1 0 TTGG ATATATCGCTAAGTCACTATATCGAAAAGGAAAGATAGAACAACTAG 
ATAATGGGTATTGATATTTATTGTATCAGAAATATAGATATACAAAAGCGA 
ATCAGAGTTCCATGTGCTGAACTAGCTTCGAAATTCAATTTGGTTTGTCCG 
TAGAATTTTCGAGCAAAATTCTCAATGATAGGACCTACGACCAAAGTTAAA 
TAAATCTTAATATAAGGGCTAAATTTAGACATCAATGCCCTATTTGTTAGC 

1 5 AGAACCTGAAGTATATATGTTATCTTAGGTTCTGTTTGTTTTTGATTTAAT 
TTTTAGTGTTTTGTCTTTTTAGTTTGATACAATAGAATAGATACATTTTAG 
TGAGTAGACAATTCTGAAGAGGAGTAAAAGTAATGGCTAAAATACATTTTG 
ATGCTGCGACTAAAGGAAATCCCGGCCGAAGTGCTTGTGCGATTATTATTA 
AAGAAAATTCACAAAGATATACATTTACCCATGATTTAGGTGAAATGGATA 

20 ATCATAGTGCAGAATGGGCAGCAATGTTACACGCTTTGGAACATGCACGCG 
AATTAAAAGTATCTAACGCGTTACTTTTTACTGATTCAAAATTAATTGAAG 
ATAGTATGATGCAAGGTAAAGTTAAAAATGCTAAGTTTAAAGTTTATTTTG 
AAAACATAGAAATCTTAGAGCAAAGTTTTGATTTGATGTTTGTGAGATGGA 
TTCCACGAAAGCAAAATAAAGAAGCGAATCAACTTGCTCAACAAACACTAT 

25 ACAAACTTACATCATAATAATCTAAAGTGTTAATATGATAT6TTACGTATT 
TAATTGTTTGTTCAACCTTACCAATAAGATTTCGGCTTTGAAAAGCTTATT 
ATAATATGTTTGTACTGAATTATCGATGGTTAACAAAAAACAAGCCTGGGA 
CATTAAGTCCCGGGCTTGTTCTGTGTTAAAGATATATTTTACTTTTTAGAA 
CGCTTTTTGGTTTTTACAACTTTTTTAGTTGATTGAGGCGTTTTCTTCTTT 

30 TTAGAAGTAACTTTCTTCTGTTTTGTATCTTTGTTATCACTGTTTTGTTTA 
CGTTTAACGACGGGGATTGAATGAATATCTCCTTCACTCAATTGCTTTTCT 
TTAGATTTCTTATGTTTAGCAACTGGAATTGAATGTTTGTCATCGTTTTGA 
ATTTGTTCTTCTTTATCTATTTTACGCTTAGTTATTACAATTGGATCTATT 
TTATCAGTATCTGAAAGACGAATGTCTTTGTGATAATTTTTTATCGATTGT 

35 TTTTCTTCTTCTTTTTTACGTCTTTTAGCCAAAACTAACCAGAAACTAGAA 
AGTAAACCAGTTAATGTAATAACACCTACAGTTTTGCCGAAATTATCAACT 
AAATGTTTGAATAGG 



40 Sequence 3437 

step. I003b07 .cons .ok 

CTTCAGCAGCAAAGTTAGCTTCACCTGAAATTCCATGTCTTTCATGTTGAA 
CTCTAAGCAACATAACAATATCTACTTTATCAATGACTTCATCAATTTCAA 
CATAAGGCGCCTCTAATGTATTATCTACCCATTCTTTTGGACTTGAGAACA 

45 TTACGTTGGCACCTAATGATGTTAAACTATGATAATTACTTCTTGCGACAC 
GAGAATTTTTAATGTCCCCACATATTAGAATATTCAAACCTTCAAACGAAC 
CATATTCTTCATATATTGTCATTATGTCTAATAAACTCTGAGTAGGATGTT 
GTCCACTTCCATCACCTGCATTAGCAATTGGAATATTTAATTGATCCAGTT 
CTTCGTAATAAGAATTTTGGGAGTGACGTATGACAAGTAAATCAACACCTA 

50 TACTTTCAAGTGTTTTACATGTGTCATAAAGTGACTCACCCTTTTTTACAG 
ATGATGTACTTGTTTCAAAATTAATAAGTTTTAATCCTAATTTTTGTTCTG 
CCATCTCAAAGCTACACTTTGTTCGCGTTGAATTTTCGAAGAATAAGTTTG 
ATACGTATTCACCGTTAAATTGAGGTAATGGTCGCTCACCAGATTTGAATT 
GGCAAGCGATAGTAATTAAATCATAAATTTCTGAATTAGATAAATGCTCCA 

55 TTGATAATAAGTGTTCCATAAAGCGCCTCCTATAGAAATAAAGCTTAATTT 
ATTTGGTTTTTATCTTTTGGCAAAATTAAATTTAATATTATTCCTGAAAGT 
GCTGATAATGCCATTCCTTCAATTTGTAAATTGATACCTATGCCTTTTAAA 
TTGATAAGAAGATTACCAATCCCGACAACAAGCACAACTGATGCTATAACC 
AAGTTGCGATTGCTTGCGAAATCTACTTGACTTTCAACAAGCATTCTTAAA 
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CCACTAGCTGCTATAATACCGAATAATAAAATTGAGACACCACCCATCACT 
GGCGTAGGTATTGAAGATATTAAAGCAGTGAACTTACCAATAAATGCAAGA 
ATGATAGCTATAACTGCCGCACCACCAATAACGTAAATACTATATATTTTG 
GTGATCGCTAGTACACCTATATTTTCACCATAAGTTGTACTAGGAGGACCT 
5 CCTATCATACTTGCAAACATAGTTGAAACACCATCACCAATGATTGATTTA 
TCTAAACCTGGATTTTCAAAGAAATTGCGTCCTACTATTTTATTAATTACC 
ATTTGATGACCAATATGTTCACTTACCGTCACAAACACCACGGGTATCATC 
ACGAGAATGAGTCCTAAATGAAAAGATGGTGTGTAATCTTTAAATGGTAGA 
TAAATATGAGGAAAATCTATCCATTTCGCTTGTGCTATTGGAGCAAATTTA 

1 0 ACTATGCCC ATGAAAATGGATACAATATATCCCACTATAATACCTATAAGT 
ACAGGTATTAGTGATAAAAATCCTTTGAAGAATCCTTGGACGATGATGGTT 
ACTGCTAATGTAATCAAAGCAACAATTAAGTAACTTAAGTTATACCCTTTC 
ATTTCAGCAGAATTTTCGAACATGGCCATGTTTACTGCTGTAGGAGCTAAA 
CTTAACCCAATGACCATTATTACTGGTCCGACAACTACTGGTGGTAACAAG 

1 5 TGCATTAACCAATGTGTTCCACTCAATTTAATGAAAATACCTATAATCACG 
TACATTAATCCACTCATAAATAATGCAACCAGCATATCTCCAAGACTATGT 
GTACTTAACCCTGTAATGATAGGTGTAATAAAGGCAAAACTCGATCCTAGG 
TAGGCAGGAATTTTCGCTTTTGTGATTAATATGTATAAAAGTGTACCAATA 
CCTGATGCCAACAAAGCTGCTGATATAGGCAGATGTGTTAAAAATGGTACA 

20 AGTACAGTTGATCCAAACATTGCAAATAAATGTTGAAGACTCAGAAATGCC 
CATTGTGCTGGTTTTGGTTTATCTTTTACATCAAGAACTGGCTGAACGGTA 
CGCTCAAACATTTGCTCATTTTCCATGTTGATTCATTTCCTTTCATAAAAA 
AAGTCTCTTTACAATCAGTAGAATGTAAAGAGACTAAAATGAAATATATCT 
AAGTGTAATGTTAAGTTAAACTTTTATCATCACATTTATCAATGAAGCTTC 

25 TATTGGATATTTATAGTTTACATAATCTTTTTATAGTCATATGTAGATTTA 
TCCATAAGAAACACTTCACCTTTTTCAGTCTCTCGTACTGTTTTAAAAGGA 
TAATTATTCAATGACGACTGCATTTCTATCGTCAATTTCTTCTAAATAAAC 
TGAAACAGATTCATCTCGTGCTGTAGGTATATTTTTTCCTACAAAATCTGC 
GCGTATAGGGAGTTCACGATGACCACGATCCACAAGTGCTGCAAGCCCTAT 

30 TTTAATAGGTCTTGTATGTAATAAAATCGCATCTAATGAGGCTCTTACTGT 
ACGTCCGGTATACAAAACATCGTCAATGATAACAACCACTTTGTTATTAAT 
ATCTACATTAATATCAAAAGCGTATTGATCAGCTTGTTGCACTACCTTATC 
AACGTCATCTCGAAAATGCGTGATATCGATAGTACCTGTTGGTACTAATTG 
TTGTTCAATTGAATTTATTTTATCTTGTATACGATGTGCTAAAAAAGCACC 

35 TCTTGTTTTAATGCCTAATAGAACTAAATCTTTAGTTCCCTTGTTATATTC 
TAGAATTTCATGAGCAATTCGTGTAATTGTACGTTGTATCGCTGCCTCATC 
TAAAATAATTCGTTCAGACATTGTTTCACCTCTTTTAATATAAAAAAACTC 
ATATCTTTAATAAAGATATGAGACAAAAAGTTAAAGAGAATAATCTATATT 
CTCTATTATATTCACATTCTTAAGTTATAAACACAGCAAAAGTATTAGTAA 

40 CTTAATCATCAAAATATTATTATCCTAATTATTCATATCTTCGGAGCCTCT 
CTGGACCCTCAATTAAAGATGACGTTTTTATATTTAAACTGTTATTAGTTT 
AAAACTTTTAATAGTATAAGTCAATGGCAAAGTTAGTATTTATAATTAAAA 
ATCGATGCCTCATTCGTTTATGCATCTCTACGTCTTATATCATCTAGTAAC 
TTTTCAAAATCGTTAGGTAATGTTGCATGCTTCTCAATGTATTCATGTGTT 

45 ACAGGATGTTCGAAACCAATTATACCAGCATGTAAAGCTTGTCCATCGATA 
TCTAACGTTTTTTTAGGTCCATACTTCGGATCCCCTACAAGAGGATAACCA 
ATATATTTCATATGTACACGAATTTGATGTGTACGCCCTGTTTCTAATTGA 
CATTCTATCAATGTATAATCTTTAAAATGCTCTAATACATTAAAGTGTGTC 
ACAGCTTCTTTACCGTCATCAACCACAGCCATCGATTGTCTATCGTTTTTA 

50 TTTCGACCAATTGGTGCGTCAATAGTGCCATAATCATGAGGGATATTACCA 
TGAACTAACGCCGTGTATTTTCGTTTAACAGTTTTAGACATAAGTTGTTCA 
ACTAAATGACGATGAGCAACATCATTTTTAGCAACCATTAATAAACCAGAC 
GTATCTTTATCTATTCTATGAACTATGCCGGGACGTATTTCGCCATTGATT 
CCTGATAAATCTTTAATTTGATACATTAATCCATTTACTAATGTTCCACTA 

55 TAATGGCCTGGAGATGGATGTACAACCATACCTTTTGGTTTATACTTTAGA 
GCACAGTGGCG 
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CTGTAATTATGAGTGAACGACTAGTTAAAGATGATGTATATACATCTATTC 
ATATTGAAGAGTATGAGTCAGAAGCTCGTGATACGAAATTAGGACCTGAGG 
AAATCACACGTGATATTCCAAACGTATCTGAAAGTGCACTTAAGAACTTAG 
5 ATGATCGTGGTATCGTTTATGTTGGTGCAGAAGTTAAAGATGGTGACATCT 
TAGTAGGTAAAGTAACGCCTAAAGGTGTAACTGAATTAACAGCTGAAGAGA 
GATTATTACACGCTATCTTTGGTGAAAAAGCACGTGAAGTTCGTGATACAT 
CATTACGTGTACCTCATGGTGCTGGCGGTATCGTTTTAGATGTTAAAGTCT 
TTAATCGTGAAGAAGGAGACGACACTCTATCTCCTGGAGTTAACCAATTAG 

1 0 TACGTGTATATATTGTCCAAAAACGTAAAATAC ATGTTGGGGAC AAAATGT 
GTGGTCGTCACGGTAACAAAGGTGTTATTTCTAAGATTGTTCCAGAAGAAG 
ATATGCCTTATTTACCAGATGGTCGACCAATCGATATTATGTTAAACCCAT 
TAGGTGTACCATCACGTATGAACATTGGACAAGTATTAGAATTACATTTAG 
GTATGGCAGCTAAAAACTTAGGAATTCACGTTGCTTCCCCAGTGTTTGATG 

1 5 GTGCCAACGATGATGACGTTTGGTC AACTATCGAAGAAGCTGGAATGGCGC 
GTGACGGTAAAACTGTTCTATACGATGGACGAACTGGTGAACCATTCGATA 
ATCGTATTTCAGTTGGTGTCATGTACATGCTTAAACTTGCACACATGGTAG 
ATGATAAGTTACATGCACGTTCTACTGGACCATACTCACTTGTTACACAAC 
AACCATTAGGTGGTAAAGCACAATTCGGTGGTCAACGTTTCGGTGAGATGG 

20 AGGTATGGGCACTTGAAGCTTATGGTGCTGCTTATACTTTACAAGAAATCT 
TAACTTATAAATCTGATGATACTGTAGGTCGTGTTAAAACTTACGAGTCTA 
TAGTTAAAGGTGAGAACATTTCTAGACCTAGTGTTCCAGAATCATTCCGAG 
TATTGATGAAAGAACTACAAAGTTTAGGTCTAGATGTTAAAGTGATGGATG 
AGCATGATAATGAAATCGAAATGGCTGATGTTGATGACGAAGATGCAGCAG 

25 AACGCAAAGTAGTAAAATATACTGGATTAAATAACATAATTTGATGAATCA 
AACCACTTACGCCGTGATTTTTTGGCACCCATAAAATTGGAGACATATAAA 
ACAAtATTCTCATTAATGCTTGCATAATCATCTGCGTATCTCTAATTAAAA 
TCCCCAAAGTGGATGTTAAAAGTGCCACCGACGATGTTAGCAAATATGCAA 
AAGGTACATATATAAGTAATTGTACAATGTGAATTGAAGGGATAATTCCAT 

30 TGAACATACAAGCTATTATAATAATTGCTAATAATCCTAAATGACCATAGA 
ACCTACTTGTTACAATATAAGTAGGAATGATTGAGAGTGGGAAATTCATCT 
TTGCCACTTGATTGAATTTCTGTGAGATTGATTTAGTTCCTTCTAGGACAC 
CTTGATTAATAAAGAACCACATACTTATACCAACTAACAACCAAAATACAA 
ACGGAATGCCGTGTATAGGTGCATTGCTTCTAATTCCTAAACCAAAAACAA 

35 GCCAATACACCATGATTTGTAAAGCAGGATTCAACACTTCCCATGCAATAC 
CTAGATAGTTATTATGATTGGTAATTTTAATTTGGAATTGTGCTAGTCTTT 
GAATTAAATAGAAGTTTTTAATATGTTCTTTTAAAACCGTTCCAACTGCGA 
ACATTAAACTTAACCACGCTTTCTCCTATAATAGATTATCTATCTATTGTA 
AATCATTAATAGTTTCTCTTTAATTTGATTCATTGAGTCCATTCTTAAATA 

40 ATAACATTTAACTGTTTATCATAAGCTCATTATATTATACAATCAATTCAG 
TTAAAAATTGTTAGAAATTTCGTGTAAGATTTAATTTTAGAAAATGAATCA 
CTTTCTTAACTCACTGTGTTATATTTATTTTCAACAAAATCTTCTATTCAC 
TTTACACTAGGTTAATAAATTTAACAACTGATGTGTTGTATATTATAATCT 
ATATAAATGTTTATAAGGAAGGTACAAAAATGAGCGTTTCGGTAAATATTG 

45 AAAATTTAACAAAAGAATACCGTATCTATAGAAATAATAAAGATAGAATTA 
AAGATGCATTAATACCTAAAAACAAAAATAAAACATTTTACGCTCTGGATA 
ACGTAAGTTTAACAGCGCATGAGGGAGATGTGATAGGTTTAGTCGGCATCA 
ACGGTTCAGGTAAGTCTACTTTAAGTAATATGATTGGTGGCTCTATTTCAC 
CAAGTTCCGGTGAAATAACGAGACATGGTGATGTGAGTGTCATCGCTATTA 

50 ATGCAGGACTAAATGGACAATTGACAGGTGTAGAAAATATTGAATTTAAAA 
TGCTCTGCATGGGCTTTAAAAGGAAAGAAATTAAAAAATTAATGCCGGAAA 
TTATAGAATTTAGTGAACTCGGCGAATTTATTTATCAACCTGTTAAAAAAT 
ATTCAAGTGGTATGCGTGCAAAACTTGGATTTTCAATTAATATTACTGTTA 
ATCCTGACATATTAGTTATTGACGAAGCATTATCAGTAGGCGATCAAACAT 

55 TTACTCAAAAATGTTTAGATAAAATTTATGAATTTAAAGCGGCTAAAAAAA 
CAATATTTTTTGTTAGTCATAATATTAGACAAGTGCGTGAATTTTGTACAA 
AAATCGCTTGGATTGAGGGCGGTAAACTAAAAGAATTCGGCGAACTTGAAG 
AAGTATTACCTGATTATGAGGCGTTTCTTAAAACTTTTAAGAAAAAATCTA 
AAGCAGAACAAAAGGAATTTAGAAATAAATTAGATGAGTCACGTTTTGTCG 
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TAAAATAAATTTGAAGAAAGTAAAACTTAAACATTAAGGGAGTGCAATAAA 
ATTCATCCGAATTGATTGCCTCCCTTTTTTACTCTTATTTTGATTTCATTT 
TATATTTTTGTATTGCTACCTTTAACATAAATTTAGGAATACTTATCATTC 
TCCCAATGCGTTTCCAATCAATAAGCACACGATATACCCACTCAATATTTA 
5 ACTTTCTAAATATTTGAGGTGCTCTTTTCTTTGAGCCACTGAATACTTCAA 
ACX3ACCCACCTACGCCCATCATCACAGTGTGCTTAAACTTGTCCTTATGCT 
TTTGAATCCATTGTTCTTGCTTTGGAAATCCCATTCCTACAAAAATGTAAT 
CGGGATTAAAACTTGTTATACGTTTTATGACTGTTTCATCTTCTAGATGAA 
TATAGCCATGATGATGTGCAAAGTGGATATTAGGGTATTGAGAT 

10 

Sequence 3439 

step . 1003b09 . cons . ok 

AGCAGAAGTAGCTGAAAACAACATTGTGAAGGAAAACCCAGATGTAGAAAT 

1 5 CTTTGAAGAAGGTATTATTGATTC ATTTCAAACTGTAGGTTTAC TTTTAGA 
AATTCAAAATAAATTGGATATTGAAGTGTCAATTATGGACTTTGATAGAGA 
TGAGTGGGCAACTCCTAATAAAATTGTAGAAGCTTTAGAAGAGTTACGATG 
AAACTGAAACCATTCTTACCCATTATAATCAGTGGCGTCATATTTATTATA 
TTTTTATTTTTACCAGCAAGTTGGTTTACTGGTCTTGTGACAAATAAGACG 

20 TTAGCTGATAATAGAATTTCTTTAACTGATCAAGTTTTAAAAGGCACGCTG 
ATTCAAAATAAATTATTCGAGTCAAATAAGTATTACCCGATTTATGGTTCA 
AGTGAACTGGAAAAGGATGATCCTTTTAATCCAGGTATAGCGCTAAATAAA 
CAAAATGCTAGTAAAGAAACGTTCCTTATTGGTGCGGGAGGTTCGACAGAT 
TTAATTAATGCAGTTGAACTTGCAGCGCAATACGACAATTTAAAAGGAAAG 

25 AAATTTACATTTATTATTTCTCCACAGTGGTTTACGAATCATGGATTAACA 
AATCAAAACTTTGATGCTCGTATGTCACAAAGTCAAATCAATCAGATGTTT 
AATCAAAAAGATATGCCAGCTAATTTGAAGAAACGGTATGCACAAAGATTG 
TTACAGTTTCCGCATGCACACAATAAGTCATACCTTAGAGAACAAGCAAAA 
CATCCTAATGATGTCTCTGGAAACTACATTTCTTCATTTAAAGAAAATCAA 

30 TTAACTAAGATTGAAGCTATTAAATCATTATTCTCATTCACTAAGCCACCT 
CTAGCAGAAGTAAAACCTGCAACAAGAGAAGATGCTTCATGGGATGAGATG 
AAACATAAAGCTGCCGATATAGGCAAAGCAAATACTCAATCTAATAAATAT 
GATATAAGAGATCCATATTGGAAATTGATAAAACAAAACAAGCGTAAAATC 
AAAAGGGATTATGAGTTCAACATTAACTCACCCGAGTTCCAAGATTTAAAA 

35 TTATTAGTGCAAACGCTACATGCTGCTGGAGCTGATGTACAATATGTTTGT 
ATACCTTCAAATGGAAGATGGTATGATCATATAGGTATC7UVAAAAGATAGA 
CGTGAAGCTGTATATAAAAAGATTCACTCAACTGTAGTTGATAATGGTGGG 
AAAATTTATGATTTGACAAATAAGGACTATGAAAAGTACGTAATTAGTGAT 
GCTGTTCATATTGGATGGAAAGGTTGGGTTTACGTCGACCAGCAAATTGCA 

40 AGACATATGGATGGTCATGCGCCTAAAAATCATGAAGTCGATTATTCTAAA 
AATAAACCACCGCACAAACATCACAACGATCGTCAAGATGATCAACATCAA 
GGCAACAAATAATATTATTAACCGAAATAAAATTAAGTGAGCTAGGTACAG 
AATTCGTTATTTCATCCCGTGTAACGAGATTTATAAAGGAAATACCTCGAC 
GTAATGTAfTTTTCGTTTACTTCTTTGATAGCCAGTATATAAAATGCTTAG 

45 GAACAATGTTTTGTATCCTAAGCATTTTATTGTTTTATTTAGCATCATAGA 
GTTAATATTAGTGCGCTATGACTTCATTTATTAATTTCTATCAAATTAAAA 
AACCTAGGACGTCACTATCTATCCTAGGTTAATAAATAATTATATAGGGAT 
GTATTAGAATACTTGTTCTACTTCTATTACACCTGGAACTTCTTCATGAAG 
TGCACGCTCGATGCCAGCTTTTT^GGTAATTGTAGAACTAGGACATGTTCC 

50 GCAAGCGCCATGTAATTGTAGCTTAACAATACCATCTTCAACATCTACAAG 
CGTACAATCGCCACCATCTCGTAATAAGAATGGACGCAAACGTTCTATAAC 
TTCTGCTACTTGATCAAACATTGTTGGATTCTCAGTAGGCATGTGACATGT 
CTCCTTTCAGAGCAATTATTTCATAATCATTATTATTACATTATAATAGAT 
AAAGTAGTAAAAATCTATAAAACAAGATAGGGGATTTTAATGACTAAAATT 

55 AGTGTTGTCGTATATGGAGCAGAAGTCGTTTGTGCGAGTTGTGTAAATGCA 
CCTACATCTATAGATACTTATCAATGGCTTCAAGCATTACTTTTAAGAAAG 
TTTCCTCAACATCATTTTGAATTTACATATATTGACATACGAAATGATACT 
GAAAATTTAACTGATCATGATATGCAATTTATAGAAAGAATTAATGAAGAT 
GAATTGTTTTACCCATTAGTTACGATGAATGATGAATATGTAGCAGATGGT 
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TACATACAATATAAACAAATAACCCGTTTTATTAAATCATATTTTACTATG 
TAAGATTAAAAGAGGGGGCAATATTCATACTAACGATGTGCCTCAATTGAC 
AGTGTCTGTGTTATTTTTTAGTGAAAATCTTCATTTTTAGGCTATATCTTT 
TATTGAGAAGAAGGACATTTCAATGCTGTCATAGTTTGTGATTATTAATTA 
5 AGCACCTAGGGAAATGTAAATGCCCTAGGTGCTTTATTTATTAACCATTAT 
GATATTTATAGCGCCACAGAACACCTGATTTTAGAATAGAGGCTAATCGTC 
CGGTAACTGTTCGATCCATGATATAAGCAAAACCTTGTTTGTCACCTAAAG 
AGCCTAAAAAGCCTTGTACTTTAATTTCAGGCATTTTATCTGGAAGTGGTT 
CGTTATTCCATTGCTTCTTCAACACCTCAGCAATCTGTTCACCTTGTAGTT 

10 CTGCTAGTTGAGCACTGGGTGCATGTGGTAAATTAGCACAGTCACCTACGA 
CATAAACATTTCTATAGGTTGGGACTTGATGGTACTGATTAATAATTACGC 
GTCCAGTGGTACTCATATCAATAGGAAGATTACGCACAATTTCAACAGGTT 
GTATGCCTGCTGTCCAAACGACTAAATCAATATTTTCTGGTTTACCATTAT 
TATAAATTTTTCCGGGTTCTACTCTGTCGATGACTGAATTAGGTACTACAG 

1 5 TAACATTGTGTTTAGAAAACCAATTAGATATGTATTTACTCAGTTTCTCTG 
GAAAATTCCTTAAAATTCGAGGCCCTCTATCATATAACAAAATTTCCAAGT 
CTGATCGACTTTCACGTAGCTCGCTCGCTAATTCAATGCCACTTAAACCTG 
CCCCAACGATACCTACACGTGCCCCTTTAGGTAACTCGCTAATTCTATGGT 
ATGTTTCACGCGATTTAGATAATGTTTGAATGCTATGTGTATATGCTTCAG 

20 CACCAGGGACATTATGATATTTATCCTCACACCCTAGACCAATGATAAGTT 
CGTCATAATCTATTTTTGAATTTCCAACTGTTATCATTTGTTCGTCCAAAT 
CTATATCACTGATTTCCCCATAAACCGTATTAATTTGACTGCTATCTGGAA 
ATTGGATTCGCACCTCTTTGTCAGATTTAGTTCCTGCTGCAAGTGCATAAA 
ATTCAGGTTTTAAACCGTGGAATGGCATGCGGTCGATTAAAGTTAAGTGAT 

25 ATCCCTCAGGAATTGAATGAGGTAAAATGCGCGACATAATTCGCATATTAC 
CATAGCCCCCGCCTAGTAATACTAAGTTTTTCATTGATTCTTTAGAGCACA 
GTGGCGATGATATCAGAT 
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ATTCAACCACTTTACGAATATATATAATAATAACGTTTGCATTACCATATT 
AACTATCTGAGTGATAGGAAATCTTAAAAACTTTTCTATTGTAGGTTTTAC 
TTTATATACAAAATAACAATTTAAATAATATGAAATAATAAAACTTACAAT 

35 AAATCCAACAATGTGACTAACCATATAATTCACATGTAACACCTTTAACAA 
AAATAAGTATGTTATATAGTAGTTAAAGGTATTAATTCCACCAACTATGAT 
AAACTTTATAATCTCATAATGTATTCGGGTTAACTTCATAATATTTCCCCT 
TTGAACAGTTATTGTTTTGTACCATATGGGACTACTTTAAATGTGATGTCT 
TTGGGCAAATCATAATAGTGTTTGAAATCTTGAAGTTGTTCCTTCGACTTT 

40 GGTGTAGTCATATGAGTATATCGCTCAAATGGTAATCTTTCCAATGTTATT 
TTCTGATAGCGATGATGTTGTGTTATTTGTTCTTTAATGAAGTCTATTCTG 
TGCACACTACTAATATGTATAAAAGTAAATCCAATCATCATAATGATGCTG 
ATGATAATCGCAAATATTTTAATTATATGTTCAAGTTGTTTAAATAGCACA 
TCACATTGCTGAATTAAACAAAGTAATATCACGATCCATAAAGTATAAATA 

45 AAATAAAAATTTCTATAACTTATAGGCGTCACAAATAAAAGTGGCAAAACA 
GATGAAGCCATAGCTATAAAACTCCCCATCACAATCATTCTTATGTATCTT 
TGCTGTATCATTTTAAACACAACGTATATCACACTTATCATGTAAATGAAG 
CAAATCGTTGTATTCAAAACGGCTATAGAAAATGAAGCTTTATATAATTCA 
AAATGAAATTGATTGTAAACGAAGATCTTATAAATAGGTAAAGTAATTAAA 

50 CCTAAGAGTAGTGGTATTTTAATATAAACTCTCATATGCTTCAGGCTTTTA 
TTTTGCTTAAGTAAAACTATACTTACTATTGATATCACGGTAAGAATAATC 
ATTTGATTAATAAACATATATTCTGGTACAAGCTTAAATAACGTCACACCT 
GCTTTATGTATCATTCCATGACTATCGGAAATTGAATAATGCGTATTTAAT 
CCATCCTTAATTAAAAAATAATTGAAGTTTAAAAACATTATAATGTTACCT 

55 ATACAACTAAGCATAAATCCTACAATTAAGAAATAACTGAGTCTTTTTTTA 
ACAAAGAAATAGACTACCATTCCTATTAAAATAATTAAGCTATTAGCGATG 
GAAAGATTCTCCAAGAAGAATTGTCCAAACAAACTTACTAATAAAAATACC 
CATAATTGCATTTCAGAAACTGTATCGTGCGACTCAATCTTTTTAACTACC 
GTAAAAAGAATAAAAAGTGATAGGACTGTAGCAGGTATATAACTAAAAAAT 
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CCAGTAAACCACCCGTAAGTTTCGCTATAAATTGTATTAGGTACAGTAACC 
ATTAACACAAAACTCAAAATAAAATAAATACGATTCGTATGTAATTGAACC 
ATATAAGCAACTAAATAGATAACTAAAAACGAAGTTATCGCATATATTAAA 
GCTCTTATTATGTTATTATGTACGGCAACCCATTCAAATAAATGTCCTAAA 
5 TAACGACCATTTTCTTGAGTCAAATATTGACTTAGATTAACTTTATATGCA 
TGCCAATCCGTAGTTGATAATGGTGTTAGAATTGCCATTATAAGATAAAAG 
ATAAAAATTCCTATTAACCAAAAAAATGATTCATATTGTTTTAAGTTATTC 
ATTGTTG ACCTCTC TGTTCTTC ATG ATGTAATTC AAAAC TAAC ATTTAAAC 
GATTATCATGTTTATTCTCTTTTTTATGTGATTTATTCAATTATCTACTGA 

10 AAATATATAGACATTTATTGAAGCTTTAATCATATTAATATATACATATTG 
CTCATCGTCAATTTTATATTCTTATTATTATACAATTCACGTATGATTTCC 
ATTTTAATTATCAAGATTTCTTTAGTTTAATTTCATTTTTGGATAGAATAA 
ATATAAACCAATTGTTTAGGAAGTGATATAAATGTCACCATATAAAGTTAT 
CATTGCACCTGATTCTTTTAAAGAAAGTATGTCGGCGAAGGAAGCTGCTTT 

15 AGCTATTAAAGATGGATTCCAAGAGGTGTTCGATTCCAGTACAATATATGA 
CATTATTCCTATGTTGAAATAAGTCATCGTATTCATGAACGCCCTGAATTA 
GGCAATGAAGAAATTTTTGCATCGAGAACATTAATTGACCAATTAAGAGCA 
AATCGATTCGAAATCGAAACGGATATTGCAGGACATGCAACAGGATTTATA 
GCAACGTATGATTCTGATATGACTGGACCGGTTATAGGATTTCTAGCTGAA 

20 TATGATGCTTTACCTGGTCTTGGTCACGCATGCGGGCATAATATTATTGGT 
ACTGCTAGCGTACTTGCTGCAGTAGCACTAAAAGAAGTCGTCGATGAAATT 
GGTGGTAAAGTAGTCGTTTTGGGATGTCCTGCTGAAGAAGGTGGGGAAAAT 
GGCTCCGCAAAAGCTTCTTATGTTAAAGCAGGTGTCATTGATGAAATTGAT 
GTAGCATTGATGATTCATCCTGGAAATGAAACTTATCGTACAATTAATACT 

25 TTAGCTGTGGATGTTCTAGATATTAAATTCTATGGACGTAGTGCGCATGCA 
TCTGAAAATGCAGATGAAGCATTAAACGCTTTAGATGCAATGATTTCATAT 
ATTAATGGTATAGCACAGTTAAGGCAACACATTAAAAAAGGACAACGAGTT 
CACGGGGTTATTTTAGACGGTGGTAAAGCGGCTAATATTATACCTGATTTT 

30 

Sequence 3441 

step . 1003c04 . cons . ok 

GTATAAGTAGCTAATTTTTCATCGTATAAAGTGTAAGGAGATTGTCTGCCA 

35 TTGACAATAGCGTTCCCTTTAAATAATTTAATTCTCACATCTCCCTCAACA 
TATTGTTGAGTACTATCGATAAAGAGTTTTAAACTATCTGTTAATGGCGAG 
AACCACAAACCATTGTATATTTGTTCTGAAAATTGTTTTTCAATGACAGGC 
TTAAAGTGCGCTACGTCTTTAGTTAATGTAATTGTTTCTAGTGCTTTGTGT 
GCTTTTAAAATAACTTCCGCACCAGGTGTTTCATAAATCTCTCTCGATTTT 

40 ATCCCGACCATTCTGTTTTCAACATGATCGATTCTACCAATACCGTGTTTG 
CCAGCAAGTTGATTCAAGTAAAGAATAAGGTCATCTAATTGATAATCTTTG 
CCATCAACTTGTACTGGAATACCTTGTTTAAATGTAAGGATAATTTCGTCT 
GCATTGTCTGGAGTTTCTTCTAAAGGTGTAGTTAAATCAAATGCATCTTCC 
GGAGGTGCGGCATACGGATCTTCTAAAATACCACATTCATTAGCTCTCCCC 

45 CATAAGTTTTGGTCAATTGAGTATGGCGAGTCATAATTGATTGAAACAGGA 
ATATTATGTTTGATTGCGTAATCAATTTCTTCTTCTCTGCTCCAAGCCCAT 
TCACGAACAGGTGCAAATGCTTTTAACTTAGGATTTAAAGCTTTGATTGCC 
ACTTCGAAACGTACTTGATCATTACCTTTACCAGTACATCCATGCGCAATA 
CCAATAGAATTTGTTTTTTCAGCAATTTCAACCAGTTTTTTCGCGATGAGT 

50 GGACGTGATAATGCTGAAACTAGAGGATATGCATTTTCATACATTAAATTT 
CCTTTAATAGCATAACTTACATAATCATCACTAAATTCTTTAGTTGCATCA 
ATAATATGACATTCGACTGCACCCATATCTAAAGCTTTTTGATATACAACG 
TCTAAATCTTTGCCTTCGCCTACGTCAAGACAACAAGCAACTACATCATAT 
CCTTTATCAATAAGCCATTGAACTGCAACGCTTGTATCTAAACCACCTGAA 

55 TATGCTAAAACGATTTTATCTTTCATATTTACACCTCAAATAATTTTTTAT 
GATTATTAATATGCATCTATCATATCAGATTGATAATGAAAATAAAACCAT 
TATTTTACATAAATATTCATTTTTAAATCTGATTATATTTAGATATTTTTG 
TTTGAAATACTGTTATAAAAGCTTTTATATATATTTTAAAATTTCTGTAAA 
TTGTATATCTAAAGTGGTGGTGTGACCTTTGATATTAAAAATAACTTTTTT 
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ATGAATGAAAGGTTACATAAAAAATAATTTTCCTGTGATAAGTTAGCCTTG 
AGAATTGAATAGCGCATCTAAACTTAGTAGAATTAGTATAAGTACTTATAA 
CAGGAGGCAATAAGATGACTCACATTCAATTAGACTATGGCAAAACTTTAG 
AATTTTTTGATAAGCATGAACTAGATCAGCAAAAGGATATTGTTAAAACTA 
5 TCCATCAAACTATTCATAAAGGTACAGGAGCAGGTAATGACTTTTTAGGTT 
GGTTAGATTTACCTGTTGATTATGATAAAGAAGAATTTTCTAGAATCGTCG 
AAGCATCTAAACGTATCAAATCAAATTCCGATGTACTTGTTGTTATCGGTA 
TTGGAGGTTCATACTTAGGTGCACGTGCTGCAATCGAGATGCTTACATCTT 
CATTTAGAACAAATACGGAATACCCTGAAATTGTATTTGTAGGTAATCATT 

1 0 TATCCTCAAGTTATACAAAAGAATTACTTGATTATTTACAAGGAAAAGATT 
TTTCAGTTAACGTTATTTCAAAATCAGGTACTACGACAGAACCAGCAGTTG 
CATTTAGATTATTTAAACAATTGGTTGAAGAAAAATATGGAAAAGATGAAG 
CTAAGAAACGTATTTTTGCAACGACAGATAAATCTAAAGGTGCACTTAAAC 
AATTAGCAGACAATGAGGGTTATGAGACGTTTGTTGTACCTGATGATGTGG 

15 GAGGTCGTTATTCTGTTCTTACAGCTGTAGGATTACTACCAATTGCAACTG 
CAGGTATCAATATTGAATCAATCATGATTGGTGCGGCTAAGGCACGTGAAG 
AGTTATCTTCTGATGATTTAGATCAAAATATCGCATATCAATATGCAACTA 
TTCGAAATATTTTATACAGCAAAGGTTATACTACTGAAATGTTAATTAATT 
ACGAACCCTCTATGCAGTATTTCAACGAATGGTGGAAACAATTATACGGTG 

20 AATCAGAAGGGAAAGATTTCAAAGGTATTTATCCATCAAGTGCGAATTACA 
CAACTGATTTACATTCCTTAGGACAATATGTTCAAGAGGGCCGTCGTTTCT 
TATTCGAGACAGTGGTTAAGGTCAACCATCCAAAACATGATATCAAAATTG 
AAGAGGATGCAGATGATTTAGACGGACTGAACTATCTTGCTGGCAAATCAA 
TCGATGAAGTGAATACTAAAGCATTTGAAGGTACATTACTTGCACATACCG 

25 ATGGTGGCGTTCC7VAATATCGTTGTAAATATTCCTCAGTTAGATGAAGAAA 
CATTTGGATATGTTGTTTATTTCTTTGAATTAGCTTGTGCAATGAGTGGAT 
ATCAATTAGGTGTTAATCCATTTAATCAACCTGGAGTTGAAGCCTATA7UVC 
AAAATATGTTTGCGCTATTAGGTAAACCAGGCTTTGAAGATAAGAAAAAAG 
AATTAGAAAATCGTTTATAATTTTAAGTTGAACTTAAAAATGAGCCTAGGA 

30 TTATCTAGGCTCATTTTTATTATGTGTATTAATAAGAGTAAATGAAAATTG 
ATTAACGTCAGATTAATAAGAAGTCAAATTCAGATAGGAAATATTAAATTT 
TCAATTCATTAGAGTTTCGATAAAGAATGGAATAATAAAATTGATGAATTA 
GATATCAATATATAGATTTATTAATTCCTACCTAAGTCTGATTTATGTTGT 
TGACCTAATTTTAATGTTTCAAAAAATATCATTTACAACTATAATGAATAA 

35 GTAATACAACCCTAAGAAAGGATTTGATGTATTGTCGCTTTCTCAGTTAGA 
AGAATGGTTTGACGCATTTCGACAATTCGGATATATTCCTGG 



Sequence 3442 

40 step . 1003c05 . cons . ok 

ACATTCAAAACTAGATAGTAAGTAAGATTTTGCGTCGCAAAACGTTTTTTA 
AAAATTGATTAAGTCTTCGATCGATTAGTATTCGTCAGCTCCAC6TGTCAC 
CACGCTTCCACCTCGAACCTATTAACCTCGTCATCTTCGAGGGATCTTATA 
ACCGAAGTTGGGAAATCTCATCTTGAGGGGGGCTTCATGCTTAGATGCTTT 

45 CAGCACTTATCCCGTCCATACATAGCTACCCAGCTATGCCGTTGGCACGAC 
AACTGGTACACCAGAGGTATGTCCATCCCGGTCCTCTCGTACTAAGGACAG 
CTCCTCTCAAATTTCCTACGCCCACGACGGATAGGGACCGAACTGTCTCAC 
GACGTTCTGAACCCAGCTCGCGTACCGCTTTAATGGGCX3AACAGCCCAACC 
CTTGGGACCGACTACAGCCCCAGGATGCGATGAGCCGACATCGAGGTGCCA 

50 AACCTCCCCGTCGATGTGAACTCTTGGGGGAGATAAGCCTGTTATCCCCGG 
GGTAGCTTTTATCCGTTGAGCGATGGCCCTTCCATGCGGAACCACCGGATC 
ACTAAGTCCGTCTTTCGACCCTGCTCGACTTGTAGGTCTCGCAGTCAAGCT 
CCCTTATGCCTTTACACTCTATGAATGATTTCCAACCATTCTGAGGGAACC 
TTTGAGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCC 

55 CGCCTGACACTGTCTCCCACCACGATAAGTGGTGCGGGTTAGAAAGCCAAC 
ACAGCTAGGGTAGTATCCCACCAACGCCTCCACGTAAGCTAGCGCTCACGT 
TTCAAAGGCTCCTACCTATCCTGTACAAGCTGTGCCGAATTTCAATATCAG 
GCTACAGTAAAGCTCCACGGGGTCTTTCCGTCCTGTCGCGGGTAACCTGCA 
TCTTCACAGGTACTATGATTTCACCGAGTCTCTCGTTGAGACAGTGCCCAA 
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ATCGTTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCT 
ACCTTAGGACCGTTATAGTTACGGCCGCCGTTTACTGGGGCTTTGATTCGT 
AGCTTCGCAGAAGCTAACCACTCCTCTTAACCTTCCAGCACCGGGCAGGCG 
TCAGCCCCTATACATCACCTTACGGTTTAGCAGAGACCTGTGTTTTTGATA 

5 AACAGTCGCTTGGGCCTATTCACTGCGGCTCTTCTGGGCGTGAACCCTAAA 
GAGCACCCCTTCTCCCGAAGTTACGGGGTCATTTTGCCGAGTTCCTTAACG 
AGAGTTCGCTCGCTCACCTTAGAATTCTCATCTTGACTACCTGTGTCGGTT 
TGCGGTACGGGCACCTGTTATCTATCTAGAGGCTTTTCTCGGCAGTGTGAA 
ATCAACGACTCGAGGAAACAATTTCCTCTCCCCATCACAGCTCAGCCTTAT 

1 0 GAGTGCCGGATTTGCCTAAC ACTCAGCCTTACTGCTTGGACGTGCACTCCA 
ACAGCACGCTTCGCCTATCCTACTGCGTCCCCCCATCGATTAAAACGATAC 
TAGGTGGTACAGGAATATCAACCTGTTATCCATCGCCTACGCCTGTCGGCC 
TCAGCTTAGGACCCGACTAACCCAGAGCGGACGAGCCTTCCTCTGGAAACC 
TTAGTCAATCGGTGGACGGGATTCTCACCCGTCTTTCGCTACTCACACCGG 

1 5 CATTCTCACTTCTAAGCGCTCCACATGTCCTTGCGATCATGCTTCGACACC 
CTTAGAACGCTCTCCTACCATTGTCCAAAGGACAATCCACAGCTTCGGTAA 
TATGTTTAGCCCCGGTACATTTTCGGCGCAGTGTCACTCGACTAGTGAGCT 
ATTACGCACTCTTTAAATGATGGCTGCTTCTAAGCCAACATCCTAGTTGTC 
TGGGCAACGCCACATCCTTTTCCACTTAACATATATTTTGGGACCTTAGCT 

20 GGTGGTCTGGGCTGTTTCCCTTTCGAACACGGACCTTATCACCCATGTTCT 
GACTCCCAAGTTAAATTAATTGGCATTCGGAGTTTGTCTGAATTCGGTAAC 
CCGAGAGGGGCCCCTCGTCCAAACAGTGCTCTACCTCCAATAATCATCACT 
TGAGGCTAGCCCTAAAGCTATTTCGGAGAGAACCAGCTATCTCCAAGTTCG 
ATTGGAATTTCTCCGCTACCCTCAGTTCATCCGCTCACTTTTCAACGTAAG 

25 TCGGTTCGGTCCTCCATTCAGTGTTACCTGAACTTCAACCTGACCAAGGGT 
AGATCACCTGGTTTCGGGTCTACGACCAAATACTCAACGCCCTATTCAGAC 
TCGCTTTCGCTGCGGCTCCACATTTGCTGCTTAACCTTGCATCAGATCGTA 
ACTCGCCGGTTCATTCTACAAAAGGCACGCCATCACCCATTAACGGGCTCT 
GACTACTTGTAAGCACACGGTTTCAAGTTCTCTTTCACTCCCCTTCCGGGG 

30 TACTTTTCACCTTTCCCTCACGGTACTGGTTCACTATCGGTCACTAGAGAG 
TATTTAGCCTTAGGAGATGGTCCTCCCAGATTCCGACGGAATTTCACGTGC 
TCCGTCGTACTCAGGATCCACTCAAGAGAGAATATGTTTTCGACTACAGGA 
TTATTACCTTCTTTGATTCATCTTTCCAGATGATTCGTCTAACATGTTCTT 
TTGTAACTCCGTATAGAGTGTCCTACAACCCCAACAAGCAAGCTTGTTGGT 

35 TTGGGCTCTTCCCGTTTCGCTCGCCGCTACTCAGGGAATCGATTTTTCTTT 
CTCTTCCTCCGGGTACTAAGATGTTTCAGTTCTCCGGGTCTGCCTTCTGAC 
ATGCTATAAATTCACATGTCGATAACATGACATAACTCATGCTGGGTTCCC 
CCATTCGGAAATCTCTGGATCAACGCTTACTTACAGCTACCCAAAGCATAT 
CGTCGTTAGTAACGTCCTTCATCGGCTTCTAGTGCCAAGGCATCCACCGTG 

40 CGCCCTTAATAACTTAATCTATGTTTCCACCATATTTTGAATTGTTATTCA 
AAATAAATAGCTAAAACTAGTTATTAATCTTGTGAGTGTTCTTTCGAACAC 
TAGCGATTATTTATGAATTCAAGCTTATTTAAAACTCTATTCACTCGGTTT 
TGCTTGGTAAAATCTTACTTGCTTATCTAGTTTTCAATGTACAAATGAATG 
TTAATAAACATTCAAAACTGAATACAATATGTCACGTTATTCCCTCATCTT 

45 CGTAGAAGATGTTCCGAATATATCCTTAGAAAGGAGGTGATCCAGCCGCAC 
CTTCCGATACGGCTACCTTGTTACGACTTCACCCCAATCATTTGTCCCACC 
TTCGACGGCTAGCTCCAAATGGTTACTCCACCGGCTTCGGGTGTTACAAAC 
TCTCGTGGTGTGACGGGCGGTGTGTACAAGACCCGGGAACGTATTCACCGT 
AGCATGCTGATCTACGATTACTAGCGATTTCACTAATTTCTTGAAAAGTTT 

50 TTTCTAATTTTTCAAAGAAACCACCTTCTTGATGTTGCTCGTTTGAGGTTT 
TTTGATTGGGTCTTTGATTTTGTCTGTCCTCGTGACTTTTATCTCGTAATG 
CGCTAACACCTGTAATTATCACAGATATGATGAAAATGATAATGCCAATAT 
TCATCTTATCACCTCTTCATTATTGTTGTGGTGATTCATCATCGTTTTGAT 
CAGTGCGCTTATTTATTXXTATTTCTCATACCTGTATCCGCTTCA 

55 

Sequence 3443 

step . 1003cO7 . cons . ok 

TGTAGCAATACCACCATGCAACTGACCTACGTTTTCAAATCCTTCTTTTAC 
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TAACCAACCAGAAAATTTTTCACAACGAATGCCACCTGTACAATATGTGAC 
AATATTTTTTCCGTCGAGTTGTTCTTTATTATTACGCACCCATTCAGGTAA 
GTCACGGAATCGTGTTATATCAGGACGAATAGCTCCACGGAAATGTCCTAA 
ATCGTATTCATAATCATTTCGAGCATCTAATATAACAGTATTTTCATCTTC 
5 TAGTGCGGCTTTAAATTCTTTAGGAGAATAGTATTTACCGGTAATTTCACG 
TGGATTAATATCTTCTTCTAAGTCAAGTGCAACAATTTCACGTCTTGGACG 
CACGTGCATCTTTTTAAACGCATGACTTTCAGCTTCATCAATTTTAAAAGT 
TAAATCAGCAAAACGACTATCTGCATGCATATGCTCTATATATTTATCAGT 
ATCTTCTTTTGTTCCAGATAATGTTCCATTAATGCCTTCCGTTGAAACTAG 

1 0 TATTCTTCCTTTTAAATGATGTTCCTTACAAAATTTCAAATGTTCGGCTGC 
AAAAGTTTCAGGGTCATCTATAGTTACATATTTATAATAAAGTAGTACTCT 
ATAATCCATAAATCTGTGCTCCTCTTTAGTCAATTTTAGTTAATTTAAAAG 
TATTTGAAGTTATATAAAAAAATGATAGTGATTGAAGTTTAAAAATTAGAC 
ATTTTAATATTGTATCAAAATGAGAATTGCTTTCAATAAGATAGCCTATAA 

1 5 ATCCGTCATTTG ACCG ATATAAGATGTGGTGGTTTTCATATATGAATAAAA 
GTGTAGTCGTGTAGCTATAATGACATTTAATATAAATTTTTATAAAATGAG 
AATAAAGTCACTGAATAAAGAGAGGGATTCAATCGATGGCAAATCAAAAAT 
TACCAACATTAAAATATACTGGTAAATCAGAAAGTGCAGTGCCAATTGTGT 
CAGAAAGTGAATTGCAAACGGTAACAGCAGAGCCGTGGGTGAAAATTTCAG 

20 ATAAAGGGTTACAACTAGAAGGACTTAATTTTAATCGCGAAGGTCAGTTAT 
TCTTATTAGACGTGTTTGAAGGGAATATTTTTAAAGTTAATCCCGCAACAA 
AAGAGGTTACAACAAAATTTCAGTCTGTTAAAGATAATCCGGCAGCGATTA 
AAGTACATAAAGATGGTCGTTTATTTATCTGTTATCTAGGTGATTTTAAGA 
CAACTGGAGGCATATTTGCGACAACAGAAAAAGGTGAACAAATAGAAGAAA 

25 TTATTTCTGATTTAAATACAGAATATTGTATTGATGACATGGTTTTTGACA 
GTAAAGGCGGATTTTATTTCACTGATTTTAGAGGGTATTCTACACAACCTT 
TGGGCGGTGTTTACTATGTAGATCCAGACTTTAAGACGGTTACGCCAATTA 
TTCAAAATATTTCTGTGGCGAATGGTATTGCTTTAAGTACGGATGAAAAAG 
TGCTATGGGTAACTGAAACTACAACTAATCGACTTCACCGAATCGCATTAG 

30 AGGATGATGGCGTGACTATTGCACCATTTGGAGCGACAATACCATATTATT 
TTACAGGTCATGAAGGACCGGATTCTTGTTGTATTGATAGTAATGATAATT 
TATATGTGGCTATGTATGGCCAAGGACGTGTATTAGTTTTCAATAAGAGAG 
GTTATCCTATAGGTCAAATTTTAATGCCAGGACGTGATGATQGAAAGATGT 
TACGTACAACACATCCACAATTTATACCTGGTACAAATCAACTTATAATTT 

35 GTACTAATGATATTGAAAACCATTCTGAAGGTGGATCTATGCTTTATACAG 
TTAATGGTTTTGCTAAAGGATATGAGAGTTATC7VATTTCAATAAACTCTTG 
AAAAAGCGTATAGAATAAGTTGTTATGTATAAATAAAAGAAGTAGAACAAA 
GGTTGAATAAAACTTTATGTTCCTCACTCGTATAGCTTAATTTGAAAATTT 
GATACAACTCATTGTATTTTCTATTAATGCAACGATTGCCATAAATAAATA 

40 TGCCTGAGACATTTAATGTGTCCCAGGCATTTATAATATGTGTTCTTGTAT 
TTTACTGTTAAAGATi\AAAGCTTAAGAAAATTTTCTATAACAAATCGTTTA 
ATTGTCGATGCTAAATTCTGCTGATGCTTCAAATTTTACGTTTTTACCTAA 
CATCACGCCACCAGTTTCTAAAGCTTGGTTAAAATTAATACCATATTTTTC 
GCGGTTAATTGTTCCACTAACGATAAAACCAGTGACTTGTTGTCCATTCAT 

45 TGGATTTTTACTTACACCATTAAATTCAACATCAAATGTCTCTTCATGAGT 
TTCACCTTTAATTGTCAAATCTCCAACTACTTGATTTTCGTTAATTTCTTT 
AGTTACAAATGTCATTTTATCGTTGTCTTCTGTACCAAAGAAATC6TTTGA 
TCTTAAATGGTTGTCTCTGTCCTCATTTTGAGTGTCAATTGAACTTGGAAT 
7VATAGTAGCTGTTGCTTTTAGTGAAGTTAAATCATTAATATCTCCATCTAA 

50 TTGAACATCGAATTGCTTAAATGTTCCTTTTACTTGGGACACCATAAGATG 
ITTAATTTTAAACTGAATATCACTGTGAACTTGATCAAAGTTAAATTTTGT 
CATTATTATAACCCCTTCTCTCTTTATTACAATAATTACGTTTCTTGTACT 
TGTTATTCTAGATACAGTAAGATGACTTATCAATTTTTTTACTCTCATTAA 
ATTTACGGCTTTTTAAACTTTTTTAGTGTGTCTATATTACTATATTCGTTA 

55 AAAAGAAATGAAGTATACATAAGAGAATGATGATATTTGGCAATCAACAGT 
AAATATATCTTCCAGAAAAATGTTGTGAAATTTCACATAAATATAAAAAAG 
AACATGTGATTGTTGATTTGAGGTTGCTCATAGTGAGGATTATTGAATTAT 
AAACTAAAAATATTTCATTTAATACTGAAGTAAAATAACGTATGTCTCAAA 
AAATAATTAAACACATACATTTTCATTTGACATTGCAATTAGAAGAAAGTA 
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ACATAACATTAAATCGTAATTTTTACGATTTAGGAACGATATAGAGATGAT 
GGTTGGATGAATATAAGATGGACATATCCGTCCCTGTCTATTTTATTTGAA 
GTGCAAACAAAGTGAAAATTCTTCAATACCTCATATAATACATCAATTGAC 
CATCATCTATTATATTGTTCTTTTTATGTTTAGTTTCCTTTTATTTCTAAC 
5 TCTTTTTTTTTTCACTCGTCTTAGGACGGGTGTTTTTTATAGTAGAGTTAT 
GAAATTGTTGAAGGTTCCGTTCGATTATATTCTGAAATTAAATCAATTCAT 
AGACTTCTTCGTATCTATGATGACTTGCAACATGGAATTTCAAATACCCCT 
TAGTACAATGAACATATTTATATAAAGAGTGAATGGGTTAAACTACGTTTT 
TAAAAATGTATAATGAGATGATATTTGAGTTTGTTTTTATATAAAGAGAGG 
1 0 C AAGTTTGTTTTTTATATAG AAACGGTGTTT ATAAAAAATAAATTGTGACG 
GAATCGTGTGATTAGTTATTTTAAATTATTGAGTTTAAAAAAGGACAA 



Sequence 3444 

15 step . 1003c08 . cons . ok 

AAATATAGATTGTCGCTTTAACAACAGAATTCAAATCAGAACCTGCTTCTT 
TTAATACCACAGTTAAATTTTCTAAAACTTGCTTAGTTTGTTCTTGAACAT 
CATCGCTAACAATTGTTCCATCAAGTGTGAGTGGAATTTGACCTGATGTAA 
AGACAAAACCGTTTATAACAGTTGCATGCGAATATGGGCCTAGTGCTTCGG 

20 GTACCTT ATC TGAGTTAC AAC AACTTATTCCTAAATATTGTGTTATTTTCC 
CTTAAATCTCCTCAAAGATAAATGATTTAAGGGTTTTTTTATGCATAAATC 
ATTTGTTCTAACCCGTTTTACTATCTATAAGATTGTTATTGTAATAAAATA 
TTGACATTAAATACTAACATAACGAAAAATAATTGAAAGGAATGAAATAAT 
GGTACAAAATGCATTTGTTGCACTGGACTTTGAAACTGCTAACAGTAAACG 

25 AACAAGTATTTGCTCTGTAGGTATGGTTAAAGTTATTGATAATCAAATAAC 
AGAGTCATTTCATACCCTTGTGAATCCTTTTGACTACTTTACTGAAACAAA 
TATTACTGTACATGGCATCCATCCCGAAGATGTGCAAGATGCTCCTGGATT 
CAAGCATGTCTATCCATATATGCTAAAATTTATTGATCAACTTCCTGTTGT 
TGCCCATAATGCTGCTTTTGATATGAATGTGTTACATCAAAGTTTAAAAAG 

30 TCATAATATAGATACACCTTCTTTGACATACTTTTGTTCATATCAATTAGC 
TAAACGAACAATTAATGCATATCGATATGGGCTAAAACATTTAATGAATCA 
TTACCATTTAGATTTTCACGGTCATCATGATGCGTTAAATGATGCTAAAGC 
TTGTGCAATGATTACCTACCGCTTATTGAAACATTATGATGATCTTCAAAG 
CATGTTGAGTATATATGGAAAAAATCTCAAAGATAAAGGCTGATTGTCCAC 

35 TTAACTTGTGTTCATCAGCCATTTAAATAAAATCATATATTGTTAAATTTT 
TAAAATCAGATTGCTCTAAACTCCCGACCGTTACACCTATTAATCGAATCG 
GAATCTCTGGATCTTTTAAGTCGTTATATAAAGTATAGGCAATATTATAGA 
TATCAGTTTCAGTACGAATAGGATCTCTCAAACTTTTTTGCTTTGAGATTG 
TTTCATATTGATAGGTCTTTATTTTTACAGTGACTGTTTTACCTGATTTTT 

40 GAATTTTATTTAAACGTTCTGCTGTTTTTCCTGAAAGCTCTCTTATTTTTC 
TTAAT^TAACATCATCATCGTTAACATCTGTAGAAAATGTCCTTTCAGTCC 
CAACAGATTTACGAACTCTACTCGCCTTTACTTCATTATGGTCTATACCTC 
GAGCTTTATTATACAAACCTCGTCCTCTTTTACCGAATAAACGTATTAATT 
CAAATTCATCTTTATTATATAAATCTTGACCTGTATATATATGATGTTGAT 

45 GCATTTTTTTCTTCGAAGCTTTGCCTACCCCTGGAAAATCTCCAATATCTA 
ATTGCATTAATATTTCATGTACATTATTGTAATCAATTACTGTCAAACCAT 
TCGGCTTGTTCATACCACTCGCTAACTTTGCTAAAAACTTATTATAAGACA 
CGCCAGCTGACGCAGTTAAACGTGTTACTTCGTATATATCTCTGCGAATAT 
AATTTGCAATGGTTGATGCTGGTAAATCCGGTCTCACTAAATGTGTAATAT 

50 CTAAATAAGCTTCATCTAAAGACATGGGTTCTACTAATTCTGTATAACTTC 
TGAATATTTTCATGATTTGACCAGATACCTCTCTATAAGTATCAAAACGGC 
TTGTTACATAATATCCATTGGGGCATAGCTTATGTGCTTGAGTCATAGGCA 
TAGCAGAGTGAACACCATAAGCTCTTGCTTCGTATGATGCCGTAGAAACTA 
CGCCTCGATGACTCGCTTTACOaCCAACGATGACAGGTTTTCCTTTTAGTT 

55 TAGGATTATCTCTCATTTCAACTTGAGCAAAAAAATAATCCATATCTATAT 
GAATAATTCTTCTTTCAGTCATCGACTCACCTCACTATGTCTTCCAAATAA 
TGGCTTTATATTATTATACGCCATCCTTTCTATGTTTACATGTATTGTGCA 
CATGGTAATATTTGTATTGTTTATTTAATTAATGTAATAAAAAAAATAGGA 
CTACAATTTTTAGAATAAATTCATAGTCCTATCCTTTTCTACATAAGTAGT 



964 



wo 01/34809 



PCT/USOO/30782 



TATGTGAATGTAGTTGCATATTTTAATGAGATTTTTTATTACAATTGAAAC 
GAGTTGAAAAATAAGAAATGAAATTTTTAAATCAATTATCAACTTTGATAT 
ATACGCTTGTCACTATAATTAGTGATTATTCATAATAGGTTTGAGTAGGCT 
TTTGATGATAAAGATACAAACCTTCTCCATAGGTTTGTATGTATTTGAATT 
5 TGCTTTGTAATGCCAATGCTACTAAAAACACATCTATCATACAATAACTTG 
TATGTATACTAAACATAAAGATTAAAGATGAGAAGGAATATATTTGAATAA 
GTGCTAGTAAGGTCGCACTTATGATAACAAGTGGTGCTATCATGATAATTG 
TAAATTGCCACCGATGAAAACAAGTCTGAGAGAACTGTACAATAATTTTGT 
TTTTATCATATTTTAAAGATGGTTTTTCACCTTTAGAAAATATAATGAACA 
1 0 AAATACGATGAATAAACTCGTGCAAAATAACTAGTACTGCGAAACCTACAA 
ATCCAAAAATAAGATTCATGACAAGATTTTGTTCAATGATATGCGTTGTTT 
GATAAGCCCATTTATAAGTAAATAGTATTGTAAATAGCGCCAATAC 



15 Sequence 3445 

step . 1003cO9 . cons . ok 

AATAAATTTTTTGAAACGGCTTTGGAATTGATATTCAAATATATCAAAATC 
TAAACCTACCAATAAATAAAAAATTGCCATACAAATTGCCATAACGACCAG 
TATACTTATTTTTTTATTAGCGCTCATTTGCATAATTTTTCCTACCTTTCA 

20 TTAGCAAGATTAGGAAAATAATTGTGCCAAATACACCTATCGTCAATCCAA 
TATTAATTTCGTAAGGGTAAACAATTATTCTACCAATAATATCAGCAATTA 
AAACAAAAATTGCACCGAGCATAAGCGTGTGTGGTAAAGCATTTTTCAAAT 
GATCGCCTCTATAGATAGATATGATATTCGGTACGATTAAACCTAAAAATG 
GTAATGTTCCAACAGTAACAACAACTAATGCTGTTAGCGTTGCTGTTATGA 

25 ATAATGCTATTTTAATGATTTTTTCATAACTTACACCTAAATTATGACTAA 
AGTCTTTTCCCATACCTGCAATAGTAAAATGATTTGCAAATACAAATGCCA 
AAATAAGTAATGGTATTGTGAGATACAACACCTCAAAACGACCACTCGTTA 
TAACTGCAAAGTTACCAGTTAACCAGTTTCCAATGCTTTGTAAAGCATTGG 
TTCTCAACGCTACAAATGTAGTAAAACTGGATAAAATACCACCAATCATAA 

30 TGCCTAAAAGTGGAACAAAGATTACATCTTTTACACGGATAAGATTAATTA 
ATTGGACAAATAAAAACGTTCCAACAATACTTAGAACAACAGCAAATAATA 
ATTTGATTAAAATGGGACCATTAGGAAAGAACAACAATGACATTAAAATAC 
CTAATTTTGCCCACTCCATAGTACCAGCAGTCGTTGGGCTTACAAATTTAT 
TTTGCATCATCTGTTGCATAATTAATCCTGATAAAGCTAGTGAACTACCCG 

35 AAAGTAGAATACTAACTGTTCTGGGAATTCGACTCGAAAACAAAATATTTA 
TTTGTTCATCACTTAAATGGAAAATATCTATTAGAGAGAGCTGACTCACCC 
CTATAAATAAAGAGACAATAGTTAAAATCACTAATAAAATAAATAAGGTAT 
AACCTTTAAAAATAAATTTCATTAGTTAATCTCCTTGCTAAATATTAAAGA 
TGTATAAGGTCAATATACGAAAGTAATGATAATGATAATGATTATCATTAA 

40 CATCAATATTATAGACTCTATTTTCTTTTTTCAATAGTTTTTTCATATTTT 
TTGGGAATTTTATAAATGTTTTATCCAGCAGTAATAATACATCAAAATATT 
TTATAATACCAACTTTCATAAAAAAAGAGATTCCTACTAATTTTAATAGAA 
ATCTCTTTAAAAGTACTATATTAATTATCAAAAACAAAGTCTTCATCACGT 
AGAGGCTCAACATTTAAAGCCAATGTATAGCCATCACCTTTAACAGAGAAG 

45 AAATCGTGGTTCTTTGTAGATGTATCTAGTGCATTTTCAATAATAGGGTTA 
AACTCTCTCTCTTCGAAATATGGTTCAAAACCTAAATTCGATAATGCTTTA 
TTTCCATTATATCGAACATAATTTAATACATCTTCAGCAAGACCAATATCA 
TCATATAATAAATGTGTATATGAAACTTCATTATCATAAAGTTCATTTAAT 
AATTTGTACATTTCTTGATCAGCTTTTTGTTTCTCACTTTCAGATAACTCA 

50 TTACGTAGACTTTGTGCATCTAAACCTGTGAACACTCCATGTATAGATTCA 
TCTAAAAGTATCTTACGTATAATTTCACCTGACGTAGTCATTTTTCCTTGT 
CCTGCGAGATATAATGGATAATAGAAGCCAGAATAGAATAGAAATGTTTCT 
AAGAATACACTAGAAACACGAGCAATATATTGATCGTAAATCGATGCTTCT 
TTACCCCAAAGTTTGTGGTAATTTTCTACAATTTTATCTGATTTATATTTT 

55 AAATGTGGCTCTTCAATAACCCAAGTATCCAATAAATAGTTGGTTTCACTA 
GATGGTAATAATGTAGTGAAGATATGAGAATAACTTTTCGCATGGATTTGT 
TCCATCATAGCCATAAATGAATAAACAGCTTTCTTTCTTAAATCAGTAGTA 
TGAAGCATGATTAATGGCATACCATCATCAGCTTGATGTGTATCTAAACCT 
GTTAAACCTGCAAGCGCTTTTTTAAAAGTATTTTTCTCAGAATCTGTTAAT 
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GTTTTCCAACTTGCTATATCTTTTGATACTTTAAATTCTGTTTCCACCCAC 
ATTTGTGAGATATTTTGACGCCAGAACATATTAGTCATATCTTCTTGGGTG 
TTCCAATTTACCGCCTTCATTAAAATCCTCCTGTCATATTTATGTATAAAA 
TAAAGAGTTTTGAACCAATGTAAATCTATAATTTAAAGAAATACATCAATT 

5 CATCACTCTACTCTAAAATCTTTACAACTTAATCAATTAAATCGCACAACT 
TGTACATTCTTCCACACTTAATAATTTATTACGTGTATAGTATAAAGATTT 
AAGGCCTTTATGGTGTGCATATACATATAATCTTGAAAGTTCACGTGTTGA 
AATTTCTGAATTAACATAAAGTATCGTTGAAATACCTTGGTCTACGTGAGT 
TTGAATTGTAGCTACTAAGTCAATAAGTTTCATTTGGTCAGTATTAAATGC 

1 0 TGATTTGTAGTACC ACATTGTTTCTGGTGATAAAAATGGCATTGGATAAAA 
TGTTTCCGCATTTCCATACGTACGACGTTCAATTTGGTCAACAATAGGCAT 
AACAGAACTTGTCGCATTTTGAACATAAGAAATACTTTGAGTCGGAGCAAT 
AGCTAATCTATAAGCGTGATAAAGTCCATATTGTTCTACTTTATTTTGCAA 
TTCTTTCCAATCATTTGAAGTAGGTATATCGATACCATCAAAAAGTTGGCG 

1 5 AACCTTTTCAAATTTTGGTTCAAATTCTTGAGATGTATAGAATTCAAAATA 
TTTACCATTTGCATAGTCTGATTGCTC7VAAGTCTTGATACTTTTCTCCACG 
CTCTTTTGCAATTTCCATTGAACGTTCGATGGAATAATAGTTCATAATCAT 
AAAGAATATATTAGCAAAGTCTTTTGCTTCTTCAGATTCATAGCCAATTTT 
ATTTTTAGCTAGATAACCATGTAAGTTCATTACTCCTAGTCCAACAGAATG 

20 TAGTTCACTATTCGCTTTTTTTACACCTGGTGCATTTTGAATATTTGCTTC 
ATCACTTACAACTGTAAGAGCATCCATACCTGTGAACACAGAATCTCTGAA 
TTTACCTGATTCCATAACATTCACTATATTCAATGAGCCTAAGTTACATGA 
AATATCTCTTTTAATTTCATCTTCAATTCCATAGTCGTTAATTACTGATGT 
CTCTTGTAAT 

25 

Sequence 3446 

step . 1003clO . cons . ok 

TATATTTCGTGTGATTTTCTTTAACTACTGCACCTGCCACAATTCGATGCG 

30 GTTTTTTCTTTAATTCTTTTAGTTWVAGTAATTCCACGTTGTGCGCGTTTAG 
CTTCTTGAAGAACATTAAAATCAATACGCTTCATAGCACCACGTTGTGTAA 
CCATTATTATGGAATCTGAGTCGTTCACATCTTCTGTCATAACAACATAGT 
CTTCATCTTTAAGATTAATTGATTTAACACCAGCTGCTCTTAAGCCTGTAT 
CCGATAATTCATTAGTTGAATAAGTTAATGACATGCCTTTATGGGTTAGAA 

35 CAGTAATTAACTGATCAGACTCTAATCGAACGACATTAATAAGTTCATCTT 
TGTCTTTAACCTTCATATTTATGAGTGGTTTATTAAACCGAGTAGTTTTAA 
ATTGTGAAGCACTACTTTTCTTAATCATGCCGTTTTTTGTAGCCATAATAT 
AAAAGGCTTCATTTTTAAAATCTTTTTCGTTGTATACATTTACCACTTCTT 
CATCTTCATCTATTGGCACAATTTGTGATATGTGTTGACCAAGCTCTTTCC 

40 AACGGATATCGGCTAATTTATGAACAGGTATAAACAAATATCTACCTTTAT 
TTGTAAATACAAGAACAGTATCTTGAGTATTCACGCTTTCATGTTTTAATA 
AACGGTCGCCGTCCTTCAAACCGATTTCAGTCACACCACTTGCGTTAAAAC 
TACGTGTAGATGTACGTTTTATATAGCCATGTTGCGTCAAACTTAAAATCA 
CTTCTTCACTAGGCACCATAACTTCTTTATCAATTTTGATTTCGGAAATTT 

45 CAGCTTCGATTGTAGATAGTCGATCCACTTTAAATTTCTTTTTAATTTCAT 
TTAGTTCATCTTTAATTACTGCTAAAAGTGCCTCATGATTATCTAAGATAT 
TTCTTAATTCTTTTATTAAAGCTTCTAACTCTTCATGTTCTT^^ 
CTTCAATGTCAGTATTTGTTAATCTATACAGCTGTAACATGACAATAGCTT 
CTGCTTGAGCTTCAGTAAAGTCATACTCTGCAACTAAATTATCTTTAGCAT 

50 CTTTTTTATTTTTAGAATTACGTATCAATGCAATAACTTCATCAAGTATAG 
ATAAAGCTTTCATTAATCCTTCCACAATATGCATACGTTTTTCAGCTTGCT 
CTAAGTCATAACGCGTTCTATTTGTAACCACTTCAATTTGATGATTTAAAT 
AACTTTCTATAATTTCACGTAATCCCATCAACTTAGGGCGACCTTCACTAA 
TAGCAACCATATTAAAATTATATGAAATTTGTAAATCCGAATTCTTATATA 

55 AATAGTTTTTGATTGATTCGCTATTAGCATCTTTTTTTAATTCGATTGCAA 
TTCGTAATCCAGTTCTATCAGTCTCATCTCGAACTTCTACAATACCATCAA 
CCTTTTTATCGGCACGTAATTCGTCAATTCTTTTAACTAAACTACTTTTAT 
TCACTTCATACGGAATTTCAGTCACAATTAATTGTTTACGTCCACTTCTTA 
AAGGCTCTTCATCTACTCGTGAACGCACGACAACCTTTCCTTTACCGGTCT 
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CATACGCTTTTTTTATACCTTCTATTCCTTGAATGATACCACCTGTAGGAA 
AGTCAGGCCCTTTGATATATTTCATCAGTTGATTAATTGTAATATCAGGTT 
GATCGATATACTTCAATGTGCCTTGTATTACTTCGGCGAGGTTATGCGGCG 
GGATATCAGTAGCATATCCTCAAGAAATCCCCGTAGATCCATTAATTAATA 
5 AATTAGGAAATCTCGCTGGTAATACCATTGGTTCCAAAGTTGTGTCATCAT 
AGTTTGGAATAAATGATACTGTTTCCTTATTAATATCCCTTAATAGTTCTT 
CTGATAATTGACTAAGTTTAGCTTCTGTGTAACGCATAGCAGCTGGAGGAT 
CGTTATCGATACTACCATTATTACCATGCATTTCAATTAGAACATGACGTA 
ACTTCCAATCTTGACTTAAGCGCACCATAGCATCATATACTGAAGAGTCTC 

10 CATGAGGATGATATTGACCTATTACATCACCGACAGTTTTCGCACTTTTAC 
GGAAATTTTTATCATACGTATTCCCACTTGAATACATTGC6AATAGTATAC 
GTCGTTGTACTGGTTTTAAACCATCACGTACATCAGGTAATGCACGTTCTT 
GAATGATATATTTACTATAACGTCCAAATCGGTCACCAATAACATCTTCAA 
GTGACAAATCTTGAATTATCTCACTCAATTCGTTTCCTCCTCAATATATTT 

1 5 TTCATTCTCTAGTATTTGGACTTCTTTATTATCCAAAATGCTTTGATCTTC 
TTGCATACCAAATTCAACGTGTTTTTCAATCCACTCTCTTCGTGGGGCAAC 
CTTATCCCCCATCAAAGTAGTGACACGTTTTGATGAACGAACTTCATCTTC 
AACTTGAACTCTAATTAATGTCCGAGTTTCTGGATTCATGGTAGTTTCCCA 
TAATTGTTCTGGATTCATTTCACCAAGACCTTTATAACGCTGTAATATGAA 

20 ACCTTTTCCTAATTGCTTTTGTAAATTTTCTAATTCTTCATCAGTCCAAGC 
GTACTCAACTTTTTTATTCTTACCTTTGCCTTTTTCTAATTTGTATAAAGG 
CGGTAACGCAATAAAGACACGTCCAGCTTGAACAAGTGGTTTCATATATTT 
AAAGAAAAATGTAAGCAATAATACTTGAATATGTGCACCATCCGTATCAGC 
ATCTGTCATGATAATAATTCTGTTGTAATTACTATCCTCAATTTTAAAGTC 

25 AGTACC/VACACCAGCACCAATAGTATGAATAATCGTATTAATTTCTTCATT 
TTTAAAAATATCCTCTAAACGTGCCTTTTCTGTATTAATAACCTTTCCACG 
AAGAGGTAAAATAGCTTGGAATTTACGGTCGCGTCCCAATTTTGCAGAACC 
TCCCGCTGAATCACCCTCAACTAGATATAACTCGTTTTTATCAGTATTTTT 
ACTTTGCGCAGGAGTTAACTTACCTGATAACAATGTATCTTTACGTTTATT 

30 TTTCTTTCCGGAGCGTGCATCTTCTCTAGCTTTACGAGCAGCCTCGCGTGC 
TTGTTGAGCTTTAATTGCTTTTTTAACTAATGATTTAGATAATTGGCCCTT 
TTCTTCTAAGTAATATGGTAATTTTTCTGAAACAACAGAGTCTACAGCACT 
CCTTGCTTCTGAAGTGCCAAGTTTTGATTTCGTTTGCCCTTCAAATTGAAG 
AAGTTCTTCTGGTATACGTACTGAAATTATCGCTGTTAAACCTTCGCGTAT 

35 ATCATTACCGTCT 



Sequence 3 447 

step . 1003cll . cons .ok 

40 CAAATTTACATCGACAAACACTCTTCGATGAAGAAGACGCACATACTTTAA 
TATCCAAAGTTGAAGCAATCAAGCATAAGGTTAATCAAATTAATATAAATG 
TCAATCTAACAACTTATGATTTAGAAGTTACTTCATCAAATATCGAAAATT 
TGATAAAAAATGTCGAACCAGACATCATCATTGATGGCATGGATAACTTCA 
AAATACGATACCTGATTAATGAGGTTTGTCACAAGTATCAAATCCCATGGG 

45 TTTATGGTGCAGCTGTTGGTAGTAAAGGATCAGTATATGGAATAGATCACC 
AAGGACCATGTCTAAAATGTTTATTGCAAACAATTCCTGACACAGGGGAAA 
GTTGCGCTATTAATGGCGTAATTCCCCCTGTTATATCAATGATTGCAAGCT 
ATGAAGTAGCAGAGGCCGTACGTTATCTTTCAGGAAAAGGATTTTCAAAGC 
AATTAATCACTATTGATGCATTTAATATCAATTATAAGTCAATGAATGTAG 

50 ATGCACTCAAAAATAAAGATTGCCCAGTGTGTGAAAAACATGAATATACGT 
TACTAGAAAGCCAACAAGAACGTACTATTGAGGACTTGTGTGGGAATGCTT 
ATTTATTTAGATTCCCACCTAAAGCTTTTAAACACGCTGCCCATTTCCCTG 
GGAATATGGTGAAATCTACTTCCTTTGCCAAATTAATTCAATATCAAACTT 
ATGAATTCACCTTGTTTAAAGATGGTCGTATGAATGCATATGGTATACACA 

55 ATGATGAAGAAGCACATCACCTATACAATACGTTGTTAAAATCCATACGCT 
AATTAAAGTAAGATTGTTACCCTTTCAACCTGTTTCTTGTTAAAATAACTT 
ATACTTAACATCAAGTTTACGACAACTGGATTAATTTAAAAGGAGAAACTA 
ACGTTGTATGGAACAAATTATCACTGATTTTATTAGTAAGTGGGGTTATAC 
AGCGATATTCATTTTAATCTTATTAGAGAACGTATTACCTGTCGTTCCATC 
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TGAGATTATTTTAACTTTTGCAGGCTTATTATCTGTGAAATCACACTTATC 
TATTTGGACATTATTAATCATAGCAACAATTGCTTCATTCATTGGTTTACT 
CATTTTGTATTATATTTGTAGACTTATCTCAGAAGAGAAATTATATCGTTT 
CGTTGATCGACATGGTAAGTGGATGAAGTTAAAAAGTAAA6ATTTGAAACG 
5 GGCAAATGATTGGTTTAAAAAGTATGGTGCGTGGGCTGTATTTTTATGTCG 
TTTTGTCCCAGTACTTCGAGTATTAATTACAATACCTGCTGGCATTAATCG 
AATGAACGTTATACAGTTTACAACTTTATCTTTAATAGGTACTACAATTTG 
GAATTTTGCTTTAATACTGCTCGGTCGTTTGCTCAGTGACAGTTTTGACGC 
TTTGATGAATGGTATTCATACATATTCACGTATCATGTATGTCATTATTAT 

10 TATTGCAGTCATATATTTTGTTATACGTTATTTAATGAAACGTCGTCGGAG 
TGTTAAATAAATGTTATGCTAAGTTTTCCGTTTCTTAACGGTTAACTTAGC 
TTTTTTATTAATTTTAATTTTATGATACGCAATTATTGAAAATTAATTGAT 
TTTTCAAGATAATCTATGTAAGCGCATTATTATAAATAATCGTATCGTTAG 
GTCGGAAAATATAATTAGAGGTGACTATGCATGACCCATATTACAGAAAGT 

1 5 G AAATGAAAC AAAAATATCTAGATTTACTCTCACAAAAATTTGACAGTGCA 
GAAAAACTTGCTACTGAAATTATTAACTTAGAGTCAATCTTAGAATTACCT 
AAAGGGACTGAACATTTTGTTAGTGACCTTCATGGTGAATACGAATCTTTC 
CAACATGTTTTAAGAAACGGATCTGGAAATGTGCGTGCTAAAATTAATGAT 
ATCTTCAAAGATAAATTATCCCAGCAAGAAATCAACGACTTAGCAGCATTA 

20 GTATACTATCCGGAAGAAAAACTAAAATTAGTTAAAAATAATTTCGATTCA 
ATCGGAACATTAAATATTTGGTATATTACAACCATTCAACGATTAATTGAT 
TTAATTACATATTGCTCATCAAAATATACACGTTCAAAATTACGCAAAGCA 
TTACCTGAACAATACGTTTATATTATTGAAGAGCTACTTTACAAGAGCAAT 
GAATTTCATAATAAAAAGCCTTATTATGAAACATTAGTTAACCAAATTATT 

25 GAATTAGAACAATCAGATGATTTAATCATTGGCCTTTCCTATACTGTACAA 
CGTCTAGTCGTAGACCATCTTCATGTCGTGGGCGATATCTATGACCGTGGT 
CCTAAACCTGATAAGATTATGGATACATTAATAAATTATCATTCTGTAGAT 
ATCCAATGGGGAAATCATGATGTATTATGGATTGGCGCCTATGCTGGTTCA 
AAAGTATGTCTTGCTAACCTTCTACGTATCTGTGCACGTTATGATAATTTA 

30 GATATTATTGAAGATGCATATGGCATCAATCTACGCCCTTTACTTACGCTT 
GCTGAAAAGTATTACGATGCTGAAAACCCAGCGTTTAAACCTAAGAAACGA 
CCAGATAAAGACGTCAGTCTTACAAAACGCGAGAAAGTCAAATCACAAAAA 
TTCATCAAGCAATTGCGATGATTCAATTTAAACTTGAAATGCCTATTATTA 
AGCGTCGTCCTTCTTTTGAGATGGAAGAACGTTTAGTTTTAGAAAAAATAG 

35 ATTATGATAACAATGAAATCACTATATATAATAAAACATATCCGCTTAAAG 
ATACTTGTTTCCA 



Sequence 3448 

40 step . 1003cl2 . cons . ok 

TGAGTTCTAATTGGCAATGATTTAAAACAGATTGAACAATGTGAGACACAT 
ACTGTTCTTGTATTTCTAAGCTTTCTGCATGATTTGTAAAGGCCATTTCTG 
GTTCAATCATCCAGAATTCAATTAAATGACGGCGTGTTTTAGATTTTTCTG 
CACGAAAAGTTGGGCCAAATGAAAAAACACGTCCGTGAGCCATTGCGGCTG 

45 CTTCCATATACAACTGCCCACTTTGTGATAAGAATGCATCTTCATCGAAAT 
ATTTTGTATGGAATAACTCACTTGTTCCCTCTGGTGCACTTGCTGTTAAAA 
TAGGTGGATCAATTTTAGTGAAGCCATTTTCATTGAAAAACTCATATGTTG 
CACGGATAATTTCATTTCTTATTTTCATGACAGCATGTTGTTTTTTTGAAC 
GTAACCATAAGTGACGATGATCCATTAAAAATTCTGTTCCATGATTCTTTG 

50 GTGTAATAGGATAATCGTGTGCTTCATGTACAATTTCGATTGATTTAACTT 
GCATTTCGTAGCCTAAATCAGAACGATTATCTTCTGTAATCGTTCCTGTGA 
TGTATAAAGATGATTCTTGAGTTATATCTTTTGCTAGTTGAAATGTTTCTT 
CATCTACTTCAGATTTTACTACTACTCCTTGCATAAATCCTGTACCATCGC 
GTAATTGTAAAAACGCTATTTTACCACTTGAACGTTTATTAGTTAACCAAG 

55 CACCAATTGTTACTTCTTGGTTAAGATGTTTTTTCGCTTGTTTAATCGTAG 
TTTTCATAACCAAACTCCTATATTTTTCTTTTTAAATACTTATTTATTTTA 
ACAAAATATGTCTTTAACTTCTACAAGTTGAGTAACTGGAATCATTCGATA 
TTATTTATTATATATTTCTAAAACGTTAGCTTTTACTCTCATTATCGATAG 
ACCTCTGTATTTGATTCAGTAATTTATTAAACTGTTTAATATTACCTTTTC 
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GTTGTTTAAAATGTTCTAGTGATTGAGCAAAAAAGCTTTTATATGTGCTAT 
TTACTAGTCGATCATCAAATGACACAATAAGTCCTTTGTCGTCTTCGTGTC 
GAATAAGACGTCCAAGACCTTGTCTAAACCTTGTTACAGCATCAGGTAATA 
CATAATCTTTAAAAGTTGATGTGAATTCAGAATCCATAAGCCAGTATTTAA 
5 TATTATGTTTATTCATAAATGGTAATTTAGCTATCATTACACATTTAAGGC 
CATTGGCTTGAAAATCAAAACCTTCAAAAAATGTTGATGTTCCAAGAAGTA 
TAGATTTATCAAAGTTATTAAATTGTTGAACAATTTTATAATTTTGGTTTT 
GTTGTTGGGTTAAAATGACATAATCTTCTAACTCAGGTAATTCATTTAACA 
AATCTTGAACCATATGCATCATTTTATAGCTTGTAAATAAGACAAGACATT 

1 0 TCGACTGTGTCACTGTGATGTATTCAAC AATATAGTCTACAATTGAAGCTA 
CATAATCGTCTAAATTTTTATAATTGTACGTTTCAACATCATTGGGCACAA 
AAACATTTGTATGGTTGGATGATGTTAAAGGTGTTGAAATTTCAAATGTAT 
TAAAGTCTATATCTTCATTAAACCAATTTTGAAATGCTTTAAATGAGTGAT 
TGAAAGTTAATGTGCCAGATATAAATGTAAGAGACTTAAATTTTTCGAGTA 

1 5 CTTGTTTCGTCAAGATATCTTTAACATCATAATCCTTGACTAAGAGTCGGA 
TTGTTGATTTTTGTGCTAAATTTTTAATTGAAATAAAGCTTGTATGATGAT 
CTTTTATACTTTGTTCGATAAGTTTAAATTTGTCATGTAAGTATAATAATT 
GTTTCCGTACAGACTTGATTGTTTTGTGATTCATTCCGTTAAAAATTTCTA 
TCGTTTTATTTAATTTATCTATGATTGCACGTAAATCTTTTAAAATCTCAC 

20 CCGTTTCAAAGTCATAAACGTAATGATACTTATGAATGTCATCATCATAAA 
CGTCTGATGTTTGAATAATATTATAAATTGTAGTGAATAGTTGCTCATTTA 
AATCATGTAACTCATTGATATTTATTTTCAGTCCAAAAACGTCTATAGGTG 
CAATATCTAGTTTCTCTAAAATACGTTGTTGCTCAAGTTTGTCTACTGCTT 
TAAGTAGTTTTTCATTTTCATTTTTGCCAATAAGTCCTAATTGATATTTAA 

25 CATCTGAATAATTTAAATCATTAGTAACTTGATTTA6CGCATAGTCAGGCA 
ATCTATGTGCTTCATCGATGATGCAATCATCAAATAGTTGATATATAGTGT 
TTTCACTGTCTGAATGAATTAAGTGCGCATGATTAGTGATACCAATTTGTA 
TGTTTTGAGCATTTCTTTTTATATAATTATAATAATGGATATCATGACGAA 
CTGGAACGTATGTTTCAATTTTTTGGTCAACATACATTTTTTGTCCACCTT 

30 TAAGATTTAATTCCTGTATGTCCCCAGTATTTGTTTCAGTTATCCAAATAA 
GTAACTGCATTTTAAGAATACTTACTTCATAATTATTTGTATCGTCTTTAA 
GAATTTGGCTGATAAGACCAAGAGATATATAATCATTTTTACTTTTGATTA 
ATGACGCGTTAATTTTAAAATCTAAAACATCATTGAGTAATGGTATGTCTT 
TCTCTAATAGCTGACTTTGTAATAATTTTGTATTTGTTGAAATCATTACAT 

35 GACGACCGGTCTCAATATTATACATTGTTGCTGCAAGCAGGTAAGCAAGAG 
ACTTTCCACTACCCAAAGGAGCTTCAATCATTGCCTTATCACTATGCATCA 
ACTGATCAAGTATAATTTCGGCTAGGTATAACTGTTGTGGCCGATATGTAA 
GATTTAATGATTGAGTGACATTTTTATATAAATCTTTTAAGGTACCATCAA 
AATTGACAGCCGGTTTTTTTAAATCAATTTGTTTACGGTAAATAATTTGTT 

40 CAAATTGTTCAAATTGGTTGTTAGGCGGTTTAGTTTGATAATTTCTAACCA 
TTTCAAACAAAATATGATAAAGATCATACTTGAGATTTTTACTTA7VATAGT 
ACAGTTGTTTTTGTGTATCTAAATGCAATTGCTCGAACTTCTCAAAAGCTT 
TAATCATCAATTTAGCAGTTGTTGTTGCATCTTCATCTGCTCTATGTGCAT 
TATTTAATGGTATATGATGAGATTCAGCCAATGCACTTAACTGGTAGCTTT 

45 TGTCTGTAGGGAATGCGATTTTAAACAATTCTAAAGTATCCATTACTCTTT 
TAGGTTTAAATTGAATATTAC 



Sequence 344 9 

50 step. 1003d01. cons, ok 

AGCTTCAGCAATCAAAAATGGTTGGAAAATGAATGACTTTGTAGGACCTGT 
TGATCATGGTGCATATAACCGTATTGAACAGATTAACGTTGAAGTCACTGA 
GGTTTAAAATACGATATATATAAATACTAGTACATTTTACCCCTCATCTAT 
AGTCTCATTTAATTCCTACAAACATTGTAATTATAATTAAATCAATATCTA 

55 TATAGATGTGGAAAGCCTGGAACAAAAAGTTCCAAAATAAGTAAATAAAGA 
CAATTTCTATTGAAATAATATAGGAATTGTCTTTTTTATAATTTTTTTGAT 
GATTTTCAGCTCGTTGAGCTATTACTTTTCTTATATTAAGCCACGAGGATG 
TCTGAATTCATCGAATTCAGTATCGTGAATTGTTTCTACAATATCATGTAC 
ATATCGTGAAATATCATTCGCGGGGATAAGTACTGAAGTTTCTATTGGTAG 
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AGTATGTTAAGTCATATTATAATCTTTATACATAAGACACCTCGTTAATTT 
AGTTTAGTGGTATTTATTAAATTATACGAAAGCACCTTATTTTTCAAAGTA 
TTTTAATATAAAATTACATATAAACACAAAGTATTTTGGCGAGACTCTTGA 
GGGAACAGGACAAGCTGAAGACTACAGGCTGAAGCTGTCACCTAAGAAAGC 
5 GAGCCAACAATACGAATTATTTTAAATAAAGATGCCAATAAATGAATTTAT 
GAAAATCCATTTACTGGCTATTTTGCTAGGAATTATGTCTCAGCGTCTCTT 
CTCTTTCCTTTCAATCCATAATAAATATTGTATACATATAAAGTCTTAGAA 
AGTAATCAGTTTATCGTCGTATGTAAATTTGATACTACAGACCCATGATTT 
CTTATGATTTATCTTTTATTATTTAAAGTGTAGAGAGGCAAGGGAGGTATA 

10 AACAACATGATTATCTACAGAAGAAATATAGAAAATGGAACACCCATTTAT 
GAAATCATAACTAAAACTTTCAAGACAATTACTATAAAGTGTGATGAAACT 
TTTAATAAGTATGAAATCTATCAATTGCTCTCTCTACTAGAGAATGACGTT 
GACAACATGCCGACAAGTTACTCATATCGTTAACGTTATTCTAGACAACTA 
GACATTGCATCAAACCTAATTGAGATTTAGCACTTTTGTCTTTCCAGTCCC 

1 5 AAATAAAATGAGTTTTAAAGCATTTAGTCCCTTAATACGTTTAGCTTTAAA 
GACATTCTATCATCCCTAAATTAACACTCACCTTAACAAGTGAGAAACGCT 
AATGTTAGCCAAAACTTGTCCCCTTTTGTAGCTCGGAGAGT^CCTTTGAGC 
ATTGCTAACATTGGCX3TTTTTAATTATTTAATTTAAAAAAAGATTATTATT 
TTTATCATTCTCAACTTCTAGTAATTGTTAGTTTGGTATCATCATATGTTT 

20 TAAAAACTACAATTTCATCATTTTCTTTTATTAAAAATTGTACTTCTTCAT 
CTATATTTATAGGTTTTAAGTACTTAATTTTGTAATTCATCCAATTCACAT 
TAAGTTCAGAAAATACTTTTTCACATATGAGTTGTCCTGGTACAATGGAAC 
GATGGATCGGATTCTCATCATTAACTATAGTTAAATATTCATTCACCATTT 
CTATTGTAAATTTCATAATTTATCTTACCTCGATAAATATTTGTTTTATAT 

25 ACATACAATGTTTTTTATCTTTATTGATTTCGAGTCCATATGTATATTTAA 
TGATATTTTTAATTAATTTTTGTGAAACCTTATGCAATGTCGCTTCATACG 
TACAATCGACTTGAAGTTGATGTTGTGTCTTAATGTGTGTCTCTTTTAATA 
TGATAGGCTTCTTTGAAAAAGATTGAAATAACTCAAATTGTGGCCATT^CT 
TGGCACACATAAGAGTAGGAACATAATGGTCATACTTTATATTAAGCAATT 

30 CGCAATAATTTTGAACTTCTTCATTTGTAAATTGAACTTCTCTAATTTCCA 
TACCATCTTTCAAATAAAGCTGCATTTCCTATCCCTCCACCTATTCCCATA 
GTTGCTATAGTTCTAAATTGATGTTTCATATAAAATAAACGTGTGACTAAC 
GCTGCTCCACTTGCACCATATGGATGCCCTGTAGCAATTGCTCCTCCCCAA 
CMTTCAACTTATTCAAAGGAATGTTAAGCTGTTGTTGGCTCGCAATAACT 

35 TGAGAGCTAAATGCTTCATTTAATTCTACTGCATTTATATCATTTATAGTT 
AATCGTTCTTGAGCTAATAATTGATTTACT6CTGGCACTGGACCTACTCCT 
AAATACTGTGGTTGAACACCTACAGTTGCACTATTCACAAACTTAATCCCT 
TCTGTGAATCCTAATTGACGTGCCCGATTTTCTTCCATAACAATCAGTAAT 
ACTGCACCATCATTTTTCTTACAACTATTTCCTACTGTGACTGTTCCTTCA 

40 TTTAAAAGTGGTTTAAGTCTGCCAAGTGTTCTGAGAGTAAGTTGAGGTTTA 
ATACTTTCATCTTGATTAAAATATTCACCTTTCACTTTGAACGGTAAAATT 
TCTTGGGAAATATTACCGTTATTCATATTTTTTGATGCCAACTGATGACTG 
CGATACGCAAAGTCATCTTGTTCATTTCTACTGATATGATATTTCTTCGCT 
ACATTTTCGGCTGCTTCAATCATTGAAGGGTCTTCTCCTTCTCTTGCAAAA 

45 GGCGCCCGTTCAAAAAATTGTGGAAACTCAGATTCATAAACTGACTGCGGA 
CGTTTGATTTTCCAAGGTGCTCTACTGGTACTCTCAACACCACCTGCAATA 
TATATTGTTCCAGCACCACTTTGTACCATCCTACAGGCTTGTATAACGGCT 
TCAAGACCTGAGCCACATTGACGATCAATTGTTATACCAGGTATTTTAAAA 
TCTAATCCCGCTTCAAGTAATGATTTTCTAGCTAAATTCCCCCCATTACCT 

50 ACCGTATTACCTA 



Sequence 3450 
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55 AATGCTACGGCTCAACTTCGTCAAAACCATTCAAAGAGCCAAAGTGATTGG 
AATCATCAACAAAACCAACAACAGAAAGATGCTTGGGATCCTGAACAAGAA 
ATCAACAAACAAAAAAAGGACGATCCATATATATATTAATAGGTTTAGATT 
GATGTTTGACTATTAAATGTTGAATGCAGTCCATAAAAATACGCCCTATCA 
6TAAAAATAATTGATAAGGCGTATTTTTTATAAAGTAAGAAAGGGACTATA 
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CAACTTGAAGTTAACTATATGAATAAATAGTGATAGAAAATGACTTTTTAA 
TTCGTCATTTCAAAATAGAGGATACATTTTCTTTGTTATTCATTTAATCAT 
AATTCTTGGAATGTTTGAATGATAAGATATACGTTAAGTCCACTTAAGACA 
ATTATGAGTAACATATCA/IAAATATGTACACCAAAGTCCTTAGCATTTTGT 
5 TGTAAAGTTGTAATTTGTTTATTAGCAATTTCATCTTTTATATTAAAAATA 
TCAATTGTGGGAACGTTATGATCTAAAGTTGCAAAGGTTAGATCAGGTCTT 
CTGAGTTTACGATTTTGTATTCTAAGTCCTTCAAACGCTTGAGGAGAAGTG 
ACTTCATGAATGAGATGTAAATCAATGTATAATAATTGTGGTTCACCTTCT 
TTTCCATGAAGCACATGTTTTTTCCATACTTTATCAAACAGTGTTTGACCC 

10 ATAATTTTCTCCTCCTCTTATAGATATTTTTCTTTAAGCAATTTAAAAATT 
TCTGAAGTTCGATATTGTCCACCTAAATCTGCAGTTGTCTTATTAGATTGA 
ATAAACGAGTAAACAATTGACTCAAGTTCGTTAGCAGCATCATTTTGATTT 
AAACTTTCTCTTAAGCAAAGTGCTAAAGATAGAACCATACCAAATGGATTC 
GCTTTATCTTCATTAGCTATATCTGGTGCTGAACCATGAATTGGTTCATAA 

15 AGACGTGTACCTGTTTGACCAAAACTAGCTGATGGAGAAAGACCTAGAGAC 
CCTGGTATAACAGATGCTTCATCACTTAAAATATCTCCGAAAAGATTCTCT 
GTTACAATCACATCAAATTGCGTAGGTTGAGTGATCAGATGCATGCTACAA 
GCATCAACTAGCATATGATTAACTTCTACTTCTGGATAATCCTTTTTTACA 
TCGTTTACTATTTGTCGCCATAACTTACTAGAGGATAGAACGTTTTCCTTA 

20 TCAACAGAAGTTAATTTTTTACGTCTACGATTTGCAAGATTAAAAGCGATT 
CTCACAATACGTTCAATTTCTTGACTTGAATATTTTAATGAGTCTAAAGCT 
TCGGTTTTTTTAACATAGCTTGGTTCACCAAAATAAATCCCGCTCGTTAAT 
TCTCTTACTATAACTAAATCTGTGCCTTCTACAATATCTTGTTTTAGTGGT 
GAAAGATGACTAGCCCCTTTAGTTACAAATGTAGGTCGAATATTAGCAAAT 

25 AAATTTAAGGATTTCCTCAGTTTTAACAAACCGTGTTCTGGACGATTATTT 
GGGTCTGTCCATTTAGGCCCTCCAATCGCGCCTAATAAAATAGCATCTGCA 
TTTTTACATGATTGTAATGTTTCGTTAGTTAAAGGTGTACCATAATAATCT 
ATTGAAACACCTCCAAAATGATGACTTTCCAAATGGTATTCAAAATGATAT 
TTTTCACTTATTAATTTTAAAAGTTCTAATGTACCAGATAGAATCTCTGGT 

30 CCTATACCATCTCCAGGTAAGGCAACTATTTTATAACTCATGTACGTATAC 
CTTCTTTCAACTCATATTCTGAAATATATTTAGCATGAGCATCGATATATG 
CTTTACATGAAGCTTTTAATATATCGTGGTCTATACCTATGCCTGTCACTT 
CAATATGATTAATAATGATTCGTACATGTACTTCTGCTTGAGCATCAGTAC 
CTTCTGTTACAGAATCAATACGATAATCAATTAATTCTGCGTCTTTCTTGA 

35 ' AAATTCGGTCAACAGCATTATAAATTGCAACAATTGAACCCGTTCCAATAC 
TTGAATCTTGTTTAACTTGTCCGTTTCTTTCCTTTATAACTACTACTGCAC 
TTTGTAGACCTTTAGATACGTATTGAAGTTGTAAGTTATCAAGTTGAAAAA 
TAGCATTGTGTTCATGTTCGGAGCCATGTATAATCGCATGAATATCTCTAT 
CGGATACATTTTTTTTCTTATCGGCAATTTCTTTAAATTGTTTAAATAATG 

40 TAACTTGATCTTCCAATTTAATTTCATATCCCAGAGCTTTAAGCTTTTCGG 
CAAATGCATGTTTACCAGACAATTTACCTAGTGGCAATTCTGTTGTATTCA 
CACCTACAAGTTGAGGTGTCATGATTTCATAGGTTTCACGATGTTTAAGGA 
CACCGTCTTGGTGAATTCCGGATTCATGACTAAATGCATTTTGACCGACTA 
TAGCTTTATTTCTAGGTACACGGATACCAGCATATCTTGAAATTAAGTCAG 

45 ATGTTTTCTTAGTTTCTTCAAGGTTAATTTGAGATTCAAGACCATAGTGGT 
CCTTCCTTACATATAAAGCCAAAGCGACTTCTTCAAGTGAGGCATTTCCTG 
CTCTTTCACCAATACCGTTCACGGTACCTTCAATACGTCTAGCTCCACCTT 
CAATAGCTGCTAAACTATTAGCAACTGCCATTCCGAGATCATCATGACAAT 
GTGCACTAAAGATAATTTTAGAATTTGACTTAACGGCCTGTGTTAATTGTT 

50 TAAAAATTTCGCCATACTCTGTAGGATAACTAAATCCAACTGTATCAGGGA 
TGTTGATAATTGTGGCTCCTGCGTTAATCGCAGTTTGAACACATTCAATTA 
AAAATGGAATTTCAGTTCTTGTTGCATCCTCTGGAGAGAATTGTACGACTT 
CAAAAAATTGTTTTGCATAAGAAACGTGTTCTTTTATTGATGTTAAAACTT 
CATCTTGAGTCATTTTTAATTTATGTTCTAAATGAATAGGGGAGGTTGCAA 

55 TGAATACATGTACTTGAGGTTTTACAGCTTCTTTAGTGGCTTCATATACAG 
CATCAATATCAGATTTTTTACATCTAGCTAAAC 
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GAGAAGCAACAGTAATATCCACAACCTTTAGTGTTGTATCTATTACGTTCG 
CTATTCTCATTGCCGAAACAATTAGAATGCAAGATCAATTTTTCTATTTTT 
ATTTAACAGTTGTCATTTCATGCTTAATTGCAGCAATGATTATGCCAAGAA 
5 TTTGGCCACTTAAAAATATTCCTGACGAATATGCTAAAGAAGTAAGTGAAG 
AGGCTCGTAATGAACAGCTACCAGAAGGCAAAACAGCATTAAAATATGGTT 
TTGATTTAGCAACTGAAGTTGGAATTAAATCGCCAGGGTTTAAAGAATTTT 
TAATTTCAGGTTTTAAAACAGTTGTAGATATGTGGTTTGTAATTTTACCAG 
TTGTTATGAGTATAGGAACAATAGCTACCATTATTGCTAACTACACGCCTG 

10 TTTTTGAAATTATAGGAAAACCATTTGTTCCAGTACTAGAATTGTTACAAA 
TTCCAGAAGCACATGAAGCATCACAAACAATTTTAATTGGGTTTGCCGATA 
TGTTCTTACCTTCAATTCTTATTGAAGGGGTTCAAAATGATGTAACACGTT 
TTGTAATTGGAGCATTGAGTATCTCACAACTTGTGTATTTATCTGAAGTAG 
GCGGCGTGATTCTTGGTTCTAAAATTCCAGTTAGTATAAGTAAATTATTTA 

1 5 TGATTTTTTTAATTCGTACTATCATTACGCTTCCAATAATTGCTTTATTAG 
CGCATTTATTTATCGGATAAAAGATATACTCATTTTTAATTTAGTGAGAGT 
GGGGGGAAGAAAATCATAACAACAGTAATATAAATTACTAATGGTTGAAAT 
AAAAATTCTCATATATCAAACTTGTTCTAATTTTTATAATTATAAAAATTA 
GACAATAAATCATATACGTATTCTAGCCCACTCCCTTTAGATAATTTATCT 

20 TCTGTTCAAAGAAGATAAAACATTTGATTTAACAATTTCGCTTTTAATGAC 
TGGTTATTTAACATTTATAGTCCTCCAATTATTTACTTAGACATTTATATT 
ATATAGAAAATAGCGCTTTTTTACTATGATAGATGTACAATCAAAATTATT 
ACAATTTTATAAATTCATTTAAAACTGCATATACGATAGCAATTCAGGGCA 
AGAACTATTTTTTTAGGTAGCCCTCTTTATTTACTTGTCTTGCACGCGCTA 

25 ATGCTAAACTATCTTCAGGGATATTATCTGTAATTGTGGATCCAGCTGCAA 
TAAGAGTATGATTACCTACTGTTACAG6TGCTATAAGGTTCGTATTACATC 
CTATAAAAGCATCTTTACCTACTATTGTTTTAAATTTGTTAGCGCCGTCAT 
AGTTTACAGTGATAGAACCACATCCGATATTTGTACGCTCACCTATTTCAG 
CATCTCCAATGTAGCTCAAGTGTGATACTTTAGCACCATCTTTAATATCTG 

30 CTTTCTTCACTTCAACAAAGTTTCCTACTTTAACTTCAGAACCTAAATTAG 
ATCCTGGGCGTAGTTGAGCGAAAGGTCCAACTGTTGTGTTCTCTCCAACAA 
TAGAGTCATTGATAACCGATTGTTTTATGTTA6CATTCGAATGAATTGTGC 
TATTATTAATTTCAGAGTATTGGCCTATCCACACATCTTCTTCAATCGTTG 
TATGACCTCCGATGCGCACGCCCGGTTCAATAGTTGTATCTATTCCAATTT 

35 TCACATCTGTTCCAATAAATGTTGAACTAGGATCAATGATTGTCACACCAT 
TTTCCATATGATAGCGATTGATACGTTGTTGCAAAGCCTTTTCAGCTTCAC 
TTAACATCAAACGATCATTAACACCAATGATTTCATCAAAATCTTCGGTAC 
AATAAACTTCAGCTTTACCTCCGTCTTTTAAAATTAAAGACAAAACATCAG 
GTAAATAATATTCTCCTTGAGCATTATCATTTTTAACTTGTTCTAACTTCT 

40 CAAATAGTACTCGATTATTAAAGGCAAAAATACCTGAACTAATTTCTTTAA 
TCGCACGTTCTGAGTCATTTGCGTCTTTCTCTTCAACAATACGCTCTAATA 
TTCCATTATGATTTCTAATAATTCGTCCATAACCATAAGGATTGATAGTAG 
AAGCAGATAATACAGTAACATGTGATTGTGTACTTTCATGATGTTCAATAA 
GTGATTGTAAAGTTTGGTATGTAATAAGTGGTGTATCTCCACATACTACTA 

45 GAGTAGTTCCTTCTTTATCTGCTAAATGTTCATGTGCCATTTTCACAGCAT 
GAGCTGTTCCAAGTTGTTTATCCTGAAAACTATATAATGATTGATTACCCA 
ATGTATCTTTCACACTCTCAGCGCCATGACCAATAATAGTTACAATTTGAT 
CAACGCCAGCTTGTTTTACGTTGTTAAGCACATGTTCAACCATTGGTTTGC 
CAGCAACCTCATGGAGCACTTTATATTTCTTTGATTTCATTCTTGTGCCCT 

50 TACCTGCTGCCAGAATAATCGCATGTCTTTGCATGAATGCTAACCCTCCAT 
TAAAATTTACACTTCATAGTTATTATAGTTGAATAACAAAGAGAGTTTCAA 
CTTATAAAAGGTTTTGTTAATCATTCATTAAAGGTTTGATAATTTTAGAAA 
TTCAAATAAGTTGTTATGCTTTAATAAGGAAAACGAGGTCATCAAACCCAA 
CAGTTACATTACATATTCAACCTCTTTATTAATCATTGGTATTTTATTTCA 

55 GTGTCTAAATCATCATATTATC/^TCATAATTTATGACGATAAAGTTATAA 
TCATACAGACATATTAATTACGTCTATCATAAGTAAATTTTTATAGATGAA 
TTGTTGTCAGAGAAACATATCAGTATTTTACTTCTATAATTCTTATTTATC 
GTTAAATGTTTATTTCGTTTTCGTTTATGGAAATACAAAGTACATTAAAAA 
TAACGGAGGGACTACAACTTCAATAAGAAGTCGCAGTCCCTCCCGAACAAT 
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GTAGCCTAATGGCACATTATAATTATATTCAGAAATTTTGATTTTTCCATT 
CTTATATACTTTTTCAACCACTGCTACGTGCCCATAATAGCCTCTCGTTGA 
TTGCATAACAGCGTATTTTCGAGGCTTATGTCCAGTTGCATAACCAGATCG 
ACGTGCATTGCTATACCAATTCTTAGCATTACCCCATCGATTTGATACAGG 

10 TTTACCTAATTGAGCACGACGTTTAAATGCCCACCAAGTGCATTGTTTTTT 
ACCATAATAATTATGTGCAGCATGGGAAATATGCGTTTGTTCAATAGCTTG 
ATATGAGAGTGTTCCAGTAAATGTAATCATTGTAGCGATAATAATACGAGA 
TAATAATTTTTTGAATTTCATTTCTTTCTCCCAAAAAATGTATTTATTTAA 
TGACAGTAATAATAATAGTATTTAAAAATGCGCTATACAATATAAGAAATG 

1 5 AATTATCAAAGTAAATGTAACATTAATGAAATATACATATTTTTATCGTAA 
TAATCGTTTGAGTAAAAAAAGAGTGAGACTACATTCAAATAAAATGTAATC 
TCACTCAGGTACGAGATGATTAACTCCATGTATATAAGTATGCGTAAATGT 
TAAAGTCATGTCATAACATCAATTTATACTTTAGTGAATTGGTTGAAAACA 
GCTAGTGCCCGTTTGATACTCGTCACTGGTATGTTTTCCGGACATGGATGA 

20 TTTCGAAAATATGTCAAGATTTCTTTTTGGCTCGTCACGTCTTTAGGGAAA 
TTGATATCTTGATTAATCCAATCAACCAATTCACCTAATGGCGTGTCATCA 
CCTAGGAAATTTTGCATAAATTCGTAAAAGCTCAATATCTCACCTCATAAT 
ATATTTTATGAGCCGACTTTTATTTTAACATATTTTCGTTTTTTAAAATAC 
TTGAAAAACTAAATAAAATGTTTTGAATTATAGTTTTTAAATATATTAATC 

25 GCTAGAAAGGTCAATTTTGTTGTATAATTAAACTTACTATTGAATTATATC 
AAAACTAGGGGGATATGCATGGATATAAAACACATGAAATACTTTGTTGAA 
GTTGtCAAACAAGGTGGTATGACAAACGCTTCAAAATCCTTGTATATTGCG 
CAACCAACTATCAGTAAAGCAATTAAGGATATTGAAGCAGAGATGGCTGTC 
CCTTTATTTGACCGGAGTAAAAGAAGTTTAGTACTTACTGATGCAGGTAAA 

30 ATTTTTTTCAAGAAATGTCAAGAAATCATCGCACTATATGATAATTTGCCC 
ACTGAAATTAATAGTTTGTATGGTTTAGAAACAGGTCATATCACTATTAGT 
ATGTCTGCAGTGATGAGCATGCGTAAATTTATTGGCGTATTAGGAGACTTT 
CATCAACTTTATCCGAATATTACGTACAACTTAATCGAAAGTGGTGGTAAG 
ACGACTGAAAACCTTATACTTAATGATGAAGTGGATATTGGTGTGACAACA 

35 TTGCCAGTAGATCATCAAAAATTTGAATGTATATCTTTAAACAAAGAAGAA 
CTGACTGTAGTTTTAAATAAAGAACATCCTTTAGCACAAAAATCTTCTATT 
AAAATGGAAGAATTAGCTGATGAGAACTTCATTTTATTTAATGAAGATTTC 
TATCTCAACGATAAAATTATTGAAAATGCGAAGAATGCTGGATTCGTGCCG 
AACATGGCCTCACAAATCTCACAATGGAATGTGATTGAAAATCTTGTCATT 

40 AATCAATTAGGTATTTCCATATTGCCAGCCACTATAGCACAATTACTTAAT 
GATGACGTCAAAATTGTACATTTGGAAAATGCACATACAACTTGGGAGCTT 
GGTGTCGTTTGGAAAAAAGATAAACGTTTAAGTCATGCTACAAATAAATGG 
ATAGAATTTTTGAAAGAAAGATTATCCGAAGAATAATATAAATATAATAAC 
AAACAATGCATTTAATAAAAGAGCGAAGTTGTTAAAGATGTGAAATAATAT 

45 GTTTTTACAAAGTCATATTACAACGAAAACGTAGGGTTTTGAATTATAAAT 
TAATTCTGCCCTACGTTTTTTTACTATCAATTAGTTCTCAGCACTATTAAT 
TTGCTAGCGTTTTCATAGATTTAAGGAATATAAAATATCATGAATAAATAT 
TTTTAATATATAATAGCAAACACTATAATAGAAAATGTCACTAAAAAAATA 
CAGTAACAATTTGGACACATTTTTAGAAAGGGGTGTTGTTGATGGAAAAAG 

50 CAAAATTTGTCATTAAGTTAATACTTCAACTTGCCCTTATCATGCTTATTA 
CTTTTATAGGCACAGAAGTTCAAAAATTACTTCATATACCTCTAGCAGGTA 
GTATCGTAGGGCTTATGCTTTTTTTCCTATTGTTACAATTTAAAATTGTAC 
CTGAATCATGGATTAATGTAGGAGCAGACTTTTTACTTAAAACAATGGTTT 
TCTTCTTTATCCCATCAGTGGTAGGAATTATGGATGTTGCATCTAATATCA 

55 CGATGAATTATATATTATTCTTTATTGTTATTATAATTGGTACATGCCTTG 
TAGCACTATCATCAGGTTATATCGCTGAAAAAATGCTAGAAAAAAGCAATA 
CACGTAAAGGAACTGATCACTCATGAATGAATACTTACAAGCAGTTCTAAT 
GATTTTGTTAACTATTGTTCTGTACTATGTTTCCAAAAAGATTCAAGATAA 
ATACAATAATCCACTATTAAATCCAGCTCTTATTGCATCAATTGCAATAAT 
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TATTGTTTTATTGGTTTGCGGAGTAAGCTATAAGGGGTATATGAAAGGTGG 
TACCTGGATTAACCATGTTTTAAACGCTACAGTTGTATGTCTTGCATACCC 
ACTTTATCAAAATAAAAAGAAAATAAAAAAATATTTAACAATTATTTTCAC 
AAGCGTGTTGACTGGTGTAGTTCTCAATTTTGTGTTAGTATTTACAACGTT 
5 GAAAATCTTTGGTTATTCTAAAGACACAATTGTTACCCTGTTACCTA 



Sequence 3453 

step . 1003dO8 . cons .ok 

1 0 GTTATCCTCTTGTTAAAAAAC ATTTTC AATTTCTAATATTTTTAAAAATTC 
ACTAGCTGCATATTTTAATGAAGATTCATCAATGTCAAAATGAGGATTATG 
ATGAGGTGCAGTAATGCCTTTGTCTTTATTACCACAACCAGTTAAGAAGAA 
TGCACCAGGTCGTACTTTAAGGTAATGTGAAAAGTCCTCACCAATCATCAT 
TAAGTCTGACTCATTAAATCTTAAATGTAAATCATTAGCTGCTTGTTTGAC 

15 TACATCATATGATTGTTGATGATTGTGTACTGGTAAATAACCTTTAATATA 
TTCCAATTCATATGTAATATCGTTTGATAAAGATAAACCTTGTAAGAGCTT 
ATCCATTTTATTTTGAACATGACTTTGTAATTTTGTGTCAAAAGTACGTAC 
AGTACCTTTACAAAAAGCTGTATCTGGAATAACACTATCTGTTGATCCTGC 
TTGAATCATTCCGAAAGTAAGAACAGCTTCCTTTACTGGATCAATTGTTCG 

20 AGAAATTATTTTTTGGGCACTTAAAATAAACTCAGCCATAATGACAATAGG 
GTCTATTGTTTCGTGTGGTTTTGCACCGTGACCACCTTTTCCATATATAGT 
CACACTAAATTCATCTGGTGAAGCCATTATTGCTCCAGGTCTAGAATAGAT 
TGTCCCAGATGGATAACCACTCCATAAGTGTGTGCCATATATTTTATCGAC 
ATTCTGTAGACAGCCATCATCAATCATCTCTTGAGAACCACCT6GCATAAT 

25 TTCCTCACCATATTGGAATATAAAAACAACATTACCTTTCAATAAATGACG 
ATGCTCATGAACAATTTCAGCTACTCCAAGCAAAATAGCTGTATGTCCGTC 
ATGTCCACAAGCATGCATGCACCCTTTATTTTTTGAACGATAGGGTACATC 
ATTTAATTCTTGAACAGGTAGTGCATCGAAATCTGCTCGTAATGCAATCGT 
TGGTCCATTTGAATCAGATCCTTTAAAAGTTGCTTTTATACCATTACGTCC 

30 AACTGGTGTTTCTATGGTGCATGCTAATTGGCTCAATTGATTTACAATAAA 
ATCATGCGTACGTTTTTCCTCAAAAGAAAGTTCTGGGTATTGATGCAAATA 
TCGTCGTAATTGCACCATTCTCTTTTCTTTATTACTTGCTAATTGAAACCA 
ATCAAACACATGTATCACTTCTTCCATCTTAATTATTGATATTAGTATATC 
ATTATTGTTTTAGCTTAAATAGACTTTGTTTTGTTATTAAGATATCACACG 

35 TTTTTTACACTATCTATTAAGATTTGGAAACATTAAAATGATTAAATATAA 
ATGAATAGCCCCATAAAAATACAACACAAATCATAATACCTCACTTAAAAT 
TAATAAGAAAAAATCGTCCATAAAATTTTTGTTATTAACTGTTATTATAGA 
AGATAAAGTTTATGTAAAAGTATTCAAAAATATGTAGAATACTAAAATACT 
AGGTAGTGATGTGAATTAGGCGTATTTOTAAGAATTTAATTGGTATACGAC 

40 ATTTTAATATTATATGAAGTTCTGGCTAAAGGAATATTAGAATTAGTAATT 
ATAGTTTATTAGAAGTGGAAAAAATTAAAAAGTAAGAAAATTATTATAAAA 
TAATACCTTAACTTATATGTAGAAATATAAAATTATTATGATATAGTTTGT 
TATAATTTTTGATGTTCACAGTTAATCACATTAAAATAGAGTTATCAATAA 
TATAAGTTATGGGGTATTTAAGTGGAGGAAATAAGACCAATAACCTATCAA 

45 GATAAAGAGGCATACTATTATTATATTCAAGAGTGGTATGAAAATGAGGAA 
AAAGTGGTACCAGGGAATACAGATATTGCCAATTATAGTTCATTTAACAAT 
ATGGTTGATCGGCTTAATTGTAGTGAAGTTGATGAGGGTTTTGTACCGACT 
ACAACACTATTTTATTTTAAAGATTCAATTATTATAGGTGCTGTTGATATC 
AGACATCAATTAAATGATAAACTATCTAATATTGGTGGTCATGTGGGATAC 

50 GGTGTAGCTAAATCTTATAGAGGGAAAGGTTACGCTACTATCCTTCTAGAA 
AAGGCTTTAGATGAACTTAAGACATTAAATGTAGAGGTCGTACTTATGACT 
TGTAATCCACTTAATTTTGCTTCTCAAACAGTGATGAAGAAATGTG6TGGA 
TATCAAATTGAATCCTATATTAAAAAAAATGGTAAACCTGTTCATCGATAT 
CATATACCTAATACAAAATAATCATGATTATTAAGAATATATTGTGATAAA 

55 GCAAATATATGATTTTTATTTTTAAAATGATTAGAAGAAAAGAGCTTTTTA 
AAGTCAATATTATTTGAGATAGGAGCTTAGACTTTTAAGTTTATTTAAGCA 
AAAGAATTTTAATGGAGACTTACGATTTAACACATCTTGTTATATTAAACT 
TTGTGAAAAACTTAACCATCTAAATAATCCTATATAATACATTAAACGCGT 
GGGGGGGATTCATAATAATGAAATATATAGATGAGCAAACACAAGCTCAAT 
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TATTAGACATGAATGAAGTCATTTTAGAAGTAGAAAAAGCTTTGCAAGCTT 
TTTCAGAGAATAAGACGATAACGCCATTAAGATATGTTTTGCCATTTAATG 
AGCAAAATCGTTATTTAGTGATGCCAGCATTATCAGATGAATTAAATATCG 
TTGGACTTAAAACAGTCTCATTTGCACCTGAAAATTCAAAAAAAGGGAAAG 
5 CGACTATTACTGGATCAGTTATTTTAAGTGACTATGAAACAGGAGAAACAT 
TGTCTATATTAGATGGTGGTTTTCTTACTAAAGTAAGAACTGGTGCAATTT 
CAGGTGTAGCTACTAAATATCTAGCAAAAGAAAACGCTAAAACACTTAGTG 
TAATAGGGGCAGGTGTACAAGCTGAAGGTTTAATTGAAGCGATACTTGCTG 
TTAGAGATATTGAAAAAATTCACATCGCAAGTAGAACGTTCGAAAAAGCAG 

1 0 AAAAATTTGCTC AAAATATACG AAATCGATTTAATATTAAAGTGAGTGTAT 
TTAGATCGGCAGATGAAGCGATAGACAGTGCAGATATTGTAGTTACAGCAA 
CAAATGCAAATCAGCCCGTTTATACTCATTCTTTACATCCAGGCGTGCATT 
TAAATGCAGTCGGATCCTTTAAACCAGATATGCAAGAAATACCTTCAGAAA 
CAATGCTTGTTGCTAATAAAATTGTTGTTG AATC TATGG AAGC AGC TTTAG 

! 5 AAGAAACAGGTGATTTAAAAATTCCTCAAGCAGAAGGAATATTAACTAAAA 
ATATGCTACATAGTGAATTAGGCGACATTATTTCT 



Sequence 3454 

20 step . 1003d09 . cons . ok 

GTGAGGGTGCTTCTGGTAAAGATGCTATCACTAAAGCGCATGAGCTAAAAC 
CAGATTTAATATTAATGGATTTACTTATGGACGATATGGATGGTGTAGAAG 
CAACTACTGAAATAAAAAAAGATTTACCTCAAATTAAAGTAGTCATGTTAA 
CAAGCTTTATAGAGGATAAAGAAGTTTATCGTGCACTTGATTCTGGAGTAG 

25 ATAGTTATATTTTAAAGACAACAAGTGCAAGTGATATAGCTGACGCTGTGC 
GTAAAACGTATGAAGGTGAATCAGTATTTGAACCAGAAGTGTTAGTAAAAA 
TGCGCAATCGTATGAAAAAACGTGCCGAGCTTTATGAAATGTTGACAGAAA 
GAGAAATGGAGATCCTATTACTTATAGCTAAAGGATACTCTAACCAAGAGA 
TTGCAAGCGCCTCTCATATCACCATCAAAACAGTAAAAACTCATGTAAGTA 

30 ACATACTAAGTAAATTAGAAGTACAAGATCGAACACAAGCAGTAATATATG 
CGTTCCAGCATAATTTAATTCAATAAATTATTAAAGGCGAAAGTAAAGATA 
CATCTATCGATACTTTCGCCTTTCAATATTATCAGTCGCTTGTGTGATGTT 
CTTCTTTAGATTGATGATTGATATTATACGTGTCATCTTCATTAGAGATAT 
CTACATCTTTGCTATCTTCATAGTATTCATAAGTCGTATTTTCGTTATAGT 

35 GATTTTCATTGTTATCATCATGTTTAATAGCGGCTTCTTCAGGCGTGTGAC 
CAGCTATGACTTTTCTTTGGTGAATAATTGCATTGATTTCAGCACCGATAA 
TAATAATAAAGCTTGTGATATACAACCATAAGAATAAAATGATAATACCAG 
CTAAACTTCCGTATGTTTTCGAATAGTTACTAAAGTTTGAAATATAGTAAC 
CGAATGCAAAGGACCCTAGTAACCAAATAATGGAAGTGT^AAATAGCGCCAG 

40 GAATAACAGAACGTAATTTTGTTTTAACATTAGGTGCAACTGAATAAAGTA 
CAGTAAATATGATGAAAATAATAATCAATGGAATAACAATTCGTACTAAAT 
TAAAAATCCATTCAATTTGATTATCAATACCTAGTGGTCCAAATAAAAATT 
TATTAATGACTGGACCTAGTGTTATAAGTACTACAGCAACAACAAATACTG 
CACCTAAAACAAGTGTATATAGAATACTTAATAATTTAACTACTACACCGT 

45 TTCGAGAATCCTCAACGTCATAAGCAACATTAAATGAGTTAATAATAGCGG 
ACATTCCATTTGATGCTGACCAAATAGCTAAAATTAACCCAACTGACAAAA 
TACCTCCACTGGCAGTGTCAGAGATATCTTTAACAATTCCACCAACGATGC 
TAGCTGTTTCTTGATCAGGGACATATTGACTTACTTTTTGATTAATCTGAT 
TAGCATCAATCGTGATAAATTGACCAAGTAGCGTAAGTAAAAAAATTAGCA 

50 TTGGGAAAAGTGCTAATACAAAATGATATGTCATCTGTGCTGCTAAACCTG 
CAGCGTCATCTTTACCTATCCTATAAATCAGGTAAGAAAAGAAATTAGAGT 
TTTCGGTGTATTTTGCTGGTTTATTCAGCCGGGAAACAAAAAATACTTGAT 
TCTTTTTTTTAGGTTTCTTAGATTGGAATTCTTGAGGTTCTATATATGTAC 
GGTCAACTTCTATTTTTTTATTCTTTTTATGCTCTTTATCTTCTATTGAAT 

55 TAAGATATTTAGAAGTTGTTTTCTCTTTTTTTGACATAACAATCTCCTTTG 
CTTTTTATGTTTTACTTTTAAAATTGATATTCCCGTATTTGCGGTATAGTA 
AAACAAATATGGTTTTATTTATATATGATAAATCCAATACCCTTTAAAAAG 
TAATCTATTTTTAACACTCAATGATTATTTTGGCCTAAATTAAATAGAAAG 
GCAATAGTTTTTAAATATTGATTTATAGTTTTTAACATGAGTTAAAGTATT 
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AAACAAGTGCGAGTCAAAATTTAACAATTATTGTAAAGTTTATAATCAATA 
AAGATAAAAAGAGTGGAGCAGAATCCATCTAAAATGAATTCCGTTCCACTT 
CATAGTTATTAAATATCATAAACTTATATGCGTACATTTTCCAATTGATGT 
CCTACTTTCTCTGAiVU^TGCTACGCATTTGAAAGTTTATTTACCTAAGCGT 
5 TGATTTTTACGATTTACAAAAGTTTCTTTGGCATCTTTAATAGAACGTTCT 
AATTCAGGATTATTACGACGTATTTCATCTAATGTATCTTTCCAATACATT 
ACTTCATCTTTGATGCTACTTATTTTTGAAGGTTTACGTGTACGATTTCCT 
TCTTTGATGTCTTTGAATGATTGTTTTAAAGAATGACGTGTAGATTTATCA 
GCAAGAGCAGCCGCGCCACCAATTACGGCACCAATTAAAATACCAGGTACT 

1 0 AATTTATTTTCCATAGTTGAATTACCTCTCTTTCAAATTTGCATCTTTTAC 
GATGTAGTCTATTAAATTATCACAAGATGATTGCACCATCTCGAAAACACC 
TTCAAAATTATTTGTGTAGTATGGATCTGGTACATCACTCTCTTCCATGTT 
ACTAAATTCTAGCAATTTGAACAATTGTCCTTGTAAATTTGGATTGATTTG 
TTTGATATTGTCTACGTTACTTTGGTCCATAGCAATAATATAGTCAAAATC 

1 5 ATCATCAGGTTCGAAAAGTTCACTCACCATACCATCATAAGGAATATGGTA 
CTTCTGTAGAATTTTTTGTGTTCCGTTATGTGGAGGTTCGCCTAAATTCCA 
ACGTCCTGTTCCTCTAGAATGAACTTTTATATCTGAAATACCTCTTTCTTG 
TAGTCTTTGTCTCATGATAGCCTCAGCCATTGGAGAACGACATATATTACC 
GAGACATACAAATGCTACATGTATCATTAGTATCACTCCAAACTATGTATT 

20 TATATACAAATAGCTCATATTATTACAAAGTTCAAACTTTTATACCTAAAA 
ATTATATTGAAGATTTCAAAGACAGTTTAAACGAACTTGGATTAGCTAAAG 
AAGGTAATTACGAATATTGTTTCTTTGAAAGTGAAGGTAAAGGGCAATTTA 
AACCAGTAGGTGATGCAAGTCCTTATATAGGGAAGTTAGATAGTATCGAAT 
ATGTTGATGAAATAAAACTTGAGTTTATGATAAA 

25 

Sequence 3 455 

step. 1003dlO . cons .ok 

GTATCTAAGCCTAAAGCTAATGAAGCAGTAGTGACGAACGAGTCAACTAAA 

30 CCAAAAACAACAGAAGCACCAACTGTTAATGAGGAATCAATAGCTGAAACA 
CCCAAAACCTCAACTACACAACAAGATTCGACTGAGAAGAATAATCCATCT 
TTAAAAGATAATTTAAATTCATCCTCAACGACATCTAAAGAAAGTAAAACA 
GACGAACATTCTACTAAGCAAGCTCAAATGTCTACTAATAAATCATU^TTTA 
GACACAAATGACTCTCCAACTCAAAGTGAGAAAACTTCATCACAAGCAAAT 

35 AACGACAGTACAGACAATCAGTCAGCACCTTCTAAACAATTAGATTCAAAA 
CCATCAGAACAAAAAGTATATAAAACAAAATTTAATGATGAACCTACTCAA 
GATGTTGAACACACGACAACTAAATTAAAAACACCTTCTATTTCAACAGAT 
AGTTCAGTCAATGATAAGCAAGATTACACACGAAGTGCTGTAGCTAGTTTA 
GGTGTTGATTCTAATGAAACAGAAGCAATTACAAATGCAGTTAGAGATAAT 

40 TTAGATTTAAAAGCTGCATCTAGAGAACAAATCAATGAAGCAATCATTGCT 
GAAGCACTAAAAAAAGACTTTTCTAACCCTGATTATGGTGTCGATACGCCA 
TTAGCTCTGAACACATCTCAATCAAAAAATTCACCACATAAGAGTGCAAGT 
CCACGCATGAATTTAATGAGTTTAGCTGCTGAGCCTAATAGTGGTAAAAAT 
GTGAATGATAAAGTTAAAATCACAAACCCTACGCTTTCACTTAATAAGAGT 

45 AATAATCACGCTAATAACGTAATATGGCCAACAAGTAACGAACAATTTAAT 
TTAAAAGCAAATTATGAATTAGATGACAGCATAAAAGAGGGAGATACTTTT 
ACTATTAAGTATGGTCAGTATATTAGACCGGGTGGTTTAGAACTTCCTGCA 
ATAAAAACTCAACTACGTAGTAAGGATGGCTCTATTGTAGCTAATGGTGTA 
TATGATAAAACTACAAATACGACGACTTATACATTTACTAACTATGTTGAT 

50 CAATATCAAAATATTACAGGTAGTTTTGATTTAATTGCGACGCCTAAGAGG 
GAAACAGCAATTAAGGATAATCAGAATTATCCTATGGAAGTGACGATTGCT 
AACGAAGTAGTCAAAAAAGACTTCATTGTGGATTATGGTAATAAAAAGGAC 
AATACAACTACAGCAGCGGTAGCAAATGTGGATAATGTAAATAATAAACAT 
AACGAAGTTGTTTATCTAAACCAAAATAACCAAAATCCTAAATATGCTAAA 

55 TATTTCTCAACAGTAAAAAATGGTAAATTTATACCAGGTGAAGTGAAAGTT 
TACGAAGTGACGGATACCAATGCGATGGTAGATAGCTTCAATCCTGATTTA 
AATAGTTCTAATGTAAAAGATGTGACAAGTCAATTTACACCTAAAGTAAGT 
GCAGATGGTACTAGAGTTGATATCAATTTTGCTAGAAGTATGGCAAATGGT 
AAAAAGTATATTGTAACTCAAGCAGTGAGACCAACGGGAACTGGAAATGTT 
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TATACCGAATATTGGTTAACAAGAGATGGTACTACCAATACAAATGATTTT 
TATCGTGGAACGAAGTCTACAACGGTGACTTATCTCAATGGTTCTTCAACA 
GCACAGGGGGATAATCCTACATATAGTCTAGGTGACTATGTATGGTTAGAT 
AAAAATAAAAACGGTGTTCAAGATGATGATGAGAAAGGTTTAGCAGGTGTT 
5 TATGTTACTCTTAAAGACAGTAACAATAGAGAATTACAACGTGTAACTACT 
GATCAATCTGGACATTATCAATTTGATAATTTACAAAATGGAACGTACACA 
GTCGAGTTTGCGATTCCTGATAATTATACGCCATCTCCCGCAAATAATTCT 
ACAAATGATGCAATAGATTCAGATGGTGAACGTGATGGTACACGTAAAGTA 
GTTGTTGCCAAAGGAACAATTAATAATGCTGATAATATGACTGTAGATACT 

1 0 GGCTTTTATTTAACTCCTAAATACAATGTCGGAGATTATGTATGGGAAGAT 
ACAAATAAAGATGGTATCCAAGATGACAATGAAAAGGGAATTTCAAATGTC 
AAAGTGACGTTAAAAAATAAAAATGGAGATACCATTGGGACAACGACAACA 
GATTCAAATGGTAAATATGAATTCACAGGTTTAGAGAACGGGGATTACACA 
ATAGAATTTGAGACGCCGGAAGGCTACACACCGACTAAACAAAACTCGGGA 

1 5 AGTGACGAAGGTAAAGATTCAAATGGTACGAAAACAACAGTCACAGTCAAA 
GATGCAGATAATAAAACAATAGACTCAGGTTTCTACAAGCCAATATATAAC 
TTAGGTGACTATGTATGGGAAGATACAAATAAAGATGGTATTCAAGACGAC 
AGTGAAAAAGGGATTTCTGGTGTTAAAGTGACGTTAAAAGATAAAAATGGA 
AATGCCATTGGGACAACGACAACAGACGCAAGTGGTCATTATCAATTTAAA 

20 GGATTAGAAAATGGAAGCTACACAGTTGAGTTTGAGACACCATCAGGTTAT 
ACACCGACAAAAGCGAATTCAGGTCAAGATATAACTGTAGATTCCAACGGT 
ATAACAACAACAGGTATCATTAACGGAGCTGATAATCTCACAATTGATAGT 
GGTTTCTACAAAACACCAAAATATAGTGTCGGAGATTATGTATGGGAAGAT 
ACAAATAAAGATGGTATCCAAGATGACAATGAAAAAGGAATTTCTGGTGTT 

25 AAAGTAACGTTAAAGGATGAAAAAGGAAATATAATTAGCACTACAACAACT 
GATGAAAATGGGAAGTATCAATTTGATAATTTAGATAGTGGTAATTACATT 
ATTCATTTTGAGAAACCGGAAGGCATGACTC/^AACTACAGCAAATTCTGGA 
AATGATGATGAAAAAGATGCTGATGGGGAAGATGTTCGTGTAACGATTACT 
GATCATGATGACTTTAGTATAGATAATGGTTATTTTGACGATGATTCAGAC 

30 AGTGACTCAGACGCAGATAGTGATTCAGACTCCGACAGTGACTCGGACGCA 
GACAGCGATTCTGACGCAGAC 



Sequence 3456 

35 step. 1003dll .cons .ok 

GCATTTGCTAAGGGACTTGTTTCATCGTTTGGGCCATTTCCTTCAGCAACA 
ATACAACATATAGGCTATGGCGCCGGCAGTAAGGATTTTGATTTCCCTAAT 
ATATTAAGGGTTATTCAATTTGAATCTGAATTCGAGCAACAAGATAGCGTC 
CAAGTAATAGAGTGTCAAATAGATGATATGACACCTGAAGCATTAGGTTAT 

40 TTTATGAATAATGCGTTAGAGCAAGGTGCTTTAGATGCTTACTATACGCCT 
ATATTTATGAAAAAAAGTCGCCCAAGCACGCAGTTAACGTTAATATGTAAA 
TTACATGATAAGACATATTTCGAACAACTTATCTTACAAGAAACAAGTTCT 
TTAGGCGTCAGAAGTACTTCTGTTAATAGAAAGACCTTGAACCGCGCATTC 
AAAATTCTTTCTACACAACACGGCACTGTTTCCATTAAATTTGGCCTACAA 

45 AATGGAAAAATTATGAAAATGAAACCCGAGTATGAAGATTTGAAGAAAATA 
GCTAAAACTACAAAACAACCGTTTCAAGTAATTCATAACGAGGTATTACAA 
CAACTCTATCAAACATATCATATAGGAAATATACTTCAATAAATTAGTTTT 
ATTCATTTAAATTTTGCACATCAAAAAAGGATAGGGACAGTCATTCTACAT 
AAAACAGTCACCTATCCTTTTATTCTATTCAAATGCTCGATATAGTTCTTC 

50 ACTAGAATTATTCCACATTTCAACGTTATGATTTTGTAGAAATTCTTTTAA 
AATTTGTTTGTTATCATCACCGATTTGGTCGATGATAAATTGATGTTTCAT 
AGACTTATCCATCATATTAACATGTTCTGGCATGCTCTTATAACCACGTCT 
AATACTTTTGTTAACTAATAAATAGCATGCAGTCACACCCGCATAATATGG 
CCCTTCTTCACTACGTTCGGTCGTCACCCAACATAGCCAGTATGCTTTAGC 

55 CTCCTCACCTTCAACTTGTGATTTGTCAGTAATCCACTTGACACCCTTTTC 
AACTTCAGCACGTGCGTGCATACCACCAATATCAACAAATGCCTTCTGATT 
TTCAACATCTATAAAAACCGGGGCAATATTATCTAAACTAATCGAACCAAT 
ATTGGTTCCTTTATGACCATCTAGTGGATCATTTTTTATAATATTGAAATT 
GAAACCTTTTTTCTCAGACAAACGAGATACCTCCTAATAAGTCATCAAATA 
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AAGACTATATATCTTATTCTTAACTGTTAATTTAATTCATTAAGTAAGTTA 
ATAATAGAACCTTTAATTTGATAATCTTTAATTTTTAAAATCATCTCTTTA 
GGATAATATAATTTATGGTCTTCTCTATTAATCCCTGTAATACTATTAATC 
ACCAATGATTGATTACTTATTTCTGTAATTCCACCATTACTTTTTAATAAA 
5 TGAATCGGTTTGCGATTTGAGCCTGGGCGATCATAATCATAAGGTAAATCT 
GAAAATGCTTCACTTACAAAGTAATAATCAGGATTAATACCACCCGCTTCA 
AATAATTCTTGCAATTCCGAAATGGTAATAATTGAACCGTCGAAAGGAATA 
TATTTAAATAAATCTCGATTGATAAACCGTCTTGATAAATCACTTAATATT 
GTATCATTTTCATGAATCCATTTCTTCAAGTAATACAATACAACTGCTTCA 

10 TCAAGTTCTACATATTGTTCAATCGTCATTGTTCCTTCAAAGAATGGTATA 
AAGTCTTTTGGATACATTTTAAATTCATATCCTTCATTATAAAGCTGCTTA 
GCTCGTTTTAAACAATTGTTTAATAAAACTTCCCCACCACGGCTTACTGGA 
TGAAAATATATTTGCCAATACATTTGATAACGACTCATAATAAAATTTTCG 
ACAGCATGCATACCACTATCTTTAATTAACACTTCTTCTTTAGATGGTCTC 

1 5 ATCAACCTTAAAATACGCTCCATATCAAATGAGCCATACGTTACGCCTGTA 
AAATATGCATCTCTTTGTAAATAGTCCATTCGATCAGCATCAATTTGTGAG 
GAAATCATAGAAATAACAAGCTTATTATCGTGCGTTTTATTAATTACATCA 
GCTACTTGTTTAGGAAAGTTATCAGAGACTCGACTTAGTACACCATTAACC 
TCAGTGTCTCCTGTGATAATAGCTTGTGTAAATGCTTCATGGTCTGTATTA 

20 AATATTTTTTCAAAACTATGAGAGAAAGGACCATGACCTAAATCATGTAAC 
AACGCAGCACACAACGCAAGTGGCCGATCAGTATTATCCCACGCGTCTTGT 
CCTATAAATGTCTCATCAATCATTCGGCGCACAATTTCATAAACACCCAAA 
GAGTGTCCAAAACGACTATGTTCGGCTGTATGAAATGATAAATACAAAGTA 
CCTAGTTGTTTAATACGTCTCAAGCGTTGGAATTCTTTTGTTTTAATTAAG 

25 TCCCAAATCAATTGATCCTGTACGTGGATAAATCGATGAATTGGATCTTTA 
AATACCTTTTCTTCTGTTAATTTACTCGTTGCATATGATGATAGCGTCATA 
CTTTCCCTCCTATCCTATATATTCCAAAAAACATTTATTTTTTGTTCCTCA 
TATTACTAGACGAAACAATGTTAATTATTCATATGTTTGTTAAAATGATTT 
CTTCATTAAAGCTAAATCCATGATTTCACCATACATTTCAGGTTGATATGA 

30 GCGAATAATCTCGAATCCATGATTTTGATAATATTCGATGCCATTACTATT 
ATTGTTATCAACTTCTAAATAAACTGTTTCATACTGGTCTTTAAACCTTTT 
CAACCCCGCTTCTAGCAAACGAGTTCCATAACCTCTATGTTGAGATTCAGG 
TCTAACATAGTGTGCAGATAAATAAAGCTCTTCACCATAAATGAAATTTGC 
AAAACCAACAATCTCACTATCTTCTTCTGCCACTAAAAATAATTGCTCATT 

35 TAGTCTCTTTTTTAAATGCGTTTCATTATATGAAGCGGCCAACAGTTCATT 

CGCAACATCTCTTATCGCTACAACATCTTTTTCAGTTGCTTGTCTGACACT 
ATACATGTGACTTCTCCCCTAACTATTTACTTCACTATTAATCGATGGGTG 
GATTGGTAACTTGTGCAAATAATTGTTTAAAATAATTAAATAGTGACTCAG 
40 GCTTATCATTAAGAACTGCAATAGGTGCATCATATCCCATCGCTCCGATGG 
GATCTTTTTTATTGATGACTAAATCTGTCATATATTTATTCTTTAGAGCAC 
AGTGGCGATGATATC 



45 Sequence 3457 

Step.l003dl2 .cons. ok 

ATTTTTCAATCATCGATAAACTTAAACTAATTCATTCTCATATTGGGAGAT 
CTTAATATTTTAATTCATATATAAATAGTAATTTTAGACACTGATCAATCT 
TTTAAGTTGGTTATTTTAAATATCAGGTTCCAATTTTTTCTTTTACAACTT 

50 AACAATTCTACTTACCTCATATACCTTCTAAAACAGGTTTCATAATTAATG 
ATAAATGACGTTTGTATCACATCAAAGGCATTTGCCTATATTTTTCAAAAA 
ATGTGTCCTTTTTTTTCAATCTATTTCGATATATATAGTGAAAGGAGGTTT 
TATTATGAATGAAGTCAATCAAGTGACGATTATCTTTACTCAAATCAAACA 
TAATTCTGAAGGTAAAGTAGTCGAACATAAGCGTCGTTTTTCTAACATTAA 

55 TCCGCTTGTTTCTAATGATCAAATCAAATCTTTTCGTTCCATCATTGAACG 
TATCAGTGGCGAAGAGTATGACAAAGTCGAGATCGTTAAAACAGAATCTTT 
AATTTAAGGAGGATACATCAGTATGTCAAAAACTTTGGAACTCGTATTTAA 
ATCAAATCTTAATAAACCCGTCAAATTACTATTACCCGATTTTAACGCTAT 
TACTACAGAACAATTAATTAAAGAAAGTATGAACCAACTTTTAGAATTAGA 
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CATTTTAAGATTTAGTTTGGGAAAGCCTGTCAAAATTTATGCTGCACAACT 
TATTGATAAAAGTACAACTGTTATATTTGAAGATAAAAATTAATTCATAAA 
ACATTATTTTTGACGCTTTATTTAAAGTACTAAAATCATAAAAAAGACTTA 
TCGAAGAACAAATTTTTTAAAATTGTTCAAGTCTAAAAAAGAACTTAATTT 
5 ACATTGTTCTTTTATTAATATGAAGTGAGATATCAAATTCAAAATGATGTC 
TCGCTTTTTTTAATTCTAAAACTTGTTTAATGTGCATTTTATTATAATGAT 
TCATCAAAATAGTATCGGCTGTATGACACCTCGGAACATTCACAACTTATT 
TCTATACTTTGCTTATGGCTCAATCATTTTAGAAACATGATTATGAATGGT 
ATGATAATCATATTAAATTGAGAAAGTTGGTTTTTATCATTGTTTTTTCTA 

1 0 TATCTTTTAGGAATTTTTGTTG6CATGTTACTTCCTATTCAAACTTCAATC 
AACTCAAGGCTAAGCCAATTTACCCGTTCATCTTTTTATGCTTCAACCATA 
TCTTTTGCTGTCGGAACAATATGCTTACTTGTGCTTAATATCATTATTCAT 
CCACAAGTATTAACACCAGAATTCTTTTCTAAGCAAACACTTAACTATACT 
TGGGTCCTAGGAGGACTATTAGGTGTTATCTATTTAACTGGAAACTTATTA 

1 5 TTATTACCAAGATTAGGCGCAGCACTTACTGTTGTTATTACGGTTACAGGG 
CAAATTATTATGGGTGTAATCATTCATACATTTGGATTATTGGGCGCCCAT 
CAACAATCTTTTACATTATTTAAAGGTGTTGGAATTATATTTTTAATTACT 
GGAATTATATTTATGAATTATGTTCGTAGACATCCTGTTAATAGACATAAG 
AATACACCAATAGTATTCTGGTTGCTTATTGGATTCGTATTTGGTTTCGCT 

20 CCGCCAATCCAAACAACAATCAACAGCACACTGGCTCAACACACACACTCA 
TCCATATTTGCTTCACTCATATCATTTAGCGTTGGTACAATAGCGTTATTT 
ATATTAACATTAGTATTTAATCGTAGTCTGAAAATATCATCTACTCATAAA 
ACACTAGGTAAGATTAAATCTATTTATTTTATTGGAGGGATACTAGGAATG 
GCATTCGTAACATCTAATATTATTTTAATGCCTTTCTTAGGAGCAGCTTTA 

25 ACTACTATAATTGCTATGATGGGCCAAATGATTATGGGTATTATTATCGAT 
CATTTCGGTTTGTTGGGTTCACCTAAAAACAAAATAACATCACGTAAGTTA 
GGTGGATTATTATGTATAGCTATAGGTATTATTTTATTACGATTATTTTAA 
ATTCCTTATTCTGAATAAAAATTATCCGCTTATATTAATGTAGATTCCTCA 
TACAAATAAATGTCTATTTCGAACTGCCAACAAAGATTTTGTCTTTGTTGG 

30 CAGTTTTTATGAGTGATTCTTCTTATAGTAACATTTATGTAGAATCATATA 
TTTTAAAATTAAAGATAGCCTTCTATTTTTCAATTTATTAAATTGATTTAC 
TTTATTTTTTGAAATAAAACAATAGTAATTGTGCACCATATTGTTCCGTCT 
CTTCTCCTGTAGTAGCTAATAAACGTATCGTACCTTGCTTTAATGCGTTTG 
ATTCTTTTAATTCAATCATCGCGATGACCATCGCCATAAGACCACCTTTCA 

35 TGTCAGTGGTACCTCTGCCGAATAACTTGCCATCTTTATCTGTGAGTTCAA 
ATGGAGGAAAAGTCCAATCATCATGGTCACCAGCATCTACAACATCCATGT 
GTCCACTGATTGCTAAGACTGGAGCGCCACTACCAATTTCAGCAACTAAAT 
TAGCGCGTGAATCATTAACTTTAACAATCTTAGAATCAATATCATATTGAC 
TTAACAGGTCTTTTAAGTACTCACATACTTTAATTTCATGATCATTCTCAG 

40 TTTGAATCTTGACGATATCAGCTAATAATCTTATTTTGTCTTGCTCACTCA 
ATACAGTCATTTCACAGTCCTCCTCATTTTTAGTAATGATTTTCACTTTTT 
AATAAAAAATAAAAATCCTTGTACACTTAATATTATCAAAAACTTTTATTA 
ATTAAAAATTATATGCTAAAATAAATGGATTTTTAATAAAAAATAAATATT 
AACTTAACATTTCTTATAAAAAACAACAATGTATAGAAATATTAAAAATTT 

45 TTCATTATTTATAGTTTGAATCGTGTTATAAAATGAAAAGCTATGAGATTA 
AGATGTAATAATACACACATATATAAATATAACTTCGTGCCACGCAGACGG 
TTAAAATCATGTATAATAGCTAATAATACATATTAGGAGATACATAATTAA 
TGATGAAAAATAAATTAACATTAAAAGAGAATCTATTTATCGGCTCAATGC 
TGTTTGGTCTTTTTTTTGGTGCTGGAAATCTCATTTTTCCAATTCACTTAG 

50 GTCAAACTGCGGGGGCAAATGTATGGACCGCCAATTTAGGATTTCTTATCA 
CGGCTATCGGACTACCTTTTTTAGGAATTATAGCGATAGGTGTATCTAAAA 
CAAACGGGGTCTTTGAAATTTCCTCAAGGATAAGTAAAATATATGGTTATT 
TGTTCACAATTGGCTTGTATCTTGTTATAGGTCCGTTTTTTGCGTTGCCAA 
GACTTGCGACGACGTCATTTGAAATAGCATTOTCACCATTTATTTCATCTG 

55 GTACGGCCCAAGCGTTGTTGCCTATTTTTAGTATTTTATTCTTCGGAGTAG 
CGTGGTTATTTTCGCGTAAACCTTCTAAAATATTAGACTATATTGGAAAAT 
TCCTTTAGAGCACAGTGGCGATGATAT 
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Sequence 3 4 58 

step . 1003e02 . cons . ok 

GTCTCGCAGTCAAGCTCCCTTATGCCTTTACACTCTATGAATGATTTCCAA 
CCATTCTGAGGGAACCTTTGAGCGCCTCCGTTACCTTTTAGGAGGCGACCG 

5 CCCCAGTCAAACTGCCCGCCTGACACTGTCTCCCACCACGATAAGTGGTGC 
GGGTTAGAAAGCCAACACAGCTAGGGTAGTATCCCACCAACGCCTCCACGT 
AAGCTAGCGCTCACGTTTCAAAGGCTCCTACCTATCCTGTACAAGCTGTGC 
CGAATTTCAATATCAGGCTACAGTAAAGCTCCACGGGGTCTTTCCGTCCTG 
TCGCGGGTAACCTGCATCTTCACAGGTACTATGATTTCACCGAGTCTCTCG 

1 0 TTGAGAC AGTGCCCAAATCGTTACGCCTTTCGTGCGGGTCGGAACTTACCC 
GACAAGGAATTTCGCTACCTTAGGACCGTTATAGTTACGGCCGCCGTTTAC 
TGGGGCTTCGATTCGTAGCTTCGCAGAAGCTAACCACTCCTCTTAACCTTC 
CAGCACCGGGCAGGCGTCAGCCCCTATACATCACCTTACGGTTTAGCAGAG 
ACCTGTGTTTTTGATAAACAGTCGCTTGGGCCTATTCACTGCGGCTCTTCT 

1 5 GGGCGTGAACCCTAAAGAGCACCCCTTCTCCCGAAGTTACGGGGTCATTTT 
GCCGAGTTCCTTAACGAGAGTTCGCTCGCTCACCTTAGAATTCTCATCTTG 
ACTACCTGTGTCGGTTTGCGGTACGGGCACCTGTTATCTATCTAGAGGCTT 
TTCTCGGCAGTGTGAAATCAACGACTCGAGGAAACAATTTCCTCTCCCCAT 
CACAGCTCAGCCTTATGAGTGCCGGATTTGCCTAACACTCAGCCTTACTGC 

20 TTGGACGTGCACTCCAACAGCACGCTTCGCCTATCCTACTGCGTCCCCCCA 
TCGATTAAAACGATACTAGGTGGTACAGGAATATCAACCTGTTATCCATCG 
CCTACGCCTGTCGGCCTCAGCTTAGGACCCGACTAACCCAGAGCGGACGAG 
CCTTCCTCTGGAAACCTTAGTCAATCGGTGGACGGGATTCTCACCCGTCTT 
TCGCTACTCACACCGGCATTCTCACTTCTAAGCGCTCCACATGTCCTTGCG 

25 ATCATGCTTCGACGCCCTTAGAACGCTCTCCTACCATTGTCCAAAGGACAA 
TCCACAGCTTCGGTAATATGTTTAGCCCCGGTACATTTTCGGCGCAGTGTC 
ACTCGACTAGTGAGCTATTACGCACTCTTTAAATGATGGCTGCTTCTAAGC 
CAACATCCTAGTTGTCTGGGCAACGCCACATCCTTTTCCACTTAACATATA 
TTTTGGGACCTTAGCTGGTGGTCTGGGCTGTTTCCCTTTCGAACACGGACC 

30 TTATCACCCATGTTCTGACTCCCAAGTTAAATTAATTGGCATTCGGAGTTT 
GTCTGAATTCGGTAACCCGAGAGGGGCCCCTCGTCCAAACAGTGCTCTACC 
TCCAATAATCATCACTTGAGGCTAGCCCTAAAGCTATTTCGGAGAGAACCA 
GCTATCTCCAAGTTCGATTGGAATTTCTCCGCTACCCTCAGTTCATCC6CT 
CACTTTTCAACGTAAGTCGGTTCGGTCCTCCATTCAGTGTTACCTGAACTT 

35 CAACCTGACCAAGGGTAGATCACCTGGTTTCGGGTCTACGACCAAATACTC 
AACGCCCTATTCAGACTCGCTTTCGCTGCGGCTCCGCATTTGCTGCTTAAC 
CTTGCATCAGATCGTAACTCGCCGGTTCATTCTACAAAAGGCACGCCATCA 
CCCATTAACGGGCTCTGACTACTTGTAAGCACACGGTTTCAAGTTCTCTTT 
CACTCCCCTTCCGGGGTACTTTTCACCTTTCCCTCACGGTACTGGTTCACT 

40 ATCGGTCACTAGAGAGTATTTAGCCTTAGGAGATGGTCCTCCCAGATTCCG 
ACGGAATTTCACGTGCTCCGTCGTACTCAGGATCCACTCAAGAGAGAATAT 
GTTTTCGACTACAGGATTATTACCTTCTTTGATTCATCTTTCCAGATGATT 
CGTCTAACATGTTCTTTTGTAACTCCGTATAGAGTGTCCTACAACCCCAAC 
AAGCAAGCTTGTTGGTTTGGGCTCTTCCCGTTTCGCTCGCCGCTACTCAGG 

45 GAATCGATTTTTCTTTCTCTTCCTCCGGGTACTAAGATGTTTCAGTTCTCC 
GGGTCTGCCTTCTGACATGCTATAAATTCACATATCGATAACATGACATAA 
CTCATGCTGGGTTCCCCCATTCGGAAATCTCTGGATCAACGCTTACTTACA 
GCTACCCAAAGCATATCGTCGTTAGTAACGTCCTTCATCGGCTTCTAGTGC 
CAAGGCATCCACCGTGCGCCCTTAATAACTTAATCTATGTTTCCACCATAT 

50 TTTGAATTGTTATTCAAAATAAATAGCTAAAACTAGTTATTAATCTTGTGA 
GTGTTCTTTCGAACACTAGCGATTATTTATGAATTCAAGCTTATTTAAAAC 
TCTATTCACTCGGTTTTGCTTGGTAAAATCTTACTTACTTATCTAGTTTTC 
AATGTACAAATGAATGTTAATAAACATTCAAAACTGAATACAATATGTCAC 
GTTATTCCCTCATCTTCGTAGAAGATGTTCCGAATATATCCTTAGAAAGGA 

55 GATGACAAGTTCATATAATCATTGTCTAAATTTCTGATATTTGGAATTTTG 
TCATGATTTAAAGTCTCAAGCAGTTTAGCTTTTTTTAATTTTTGAACAGTA 
TATTCACTTGGAAAAGCTACATCATAGTGTGTGCCTCCATTTCGAATTTTC 
GCTTCCATTGCTTCATTTGAATCAAAGGTTTCATATACGACTTGTATACCT 
GTTTCTTTTTTAAACTTCTTAATTAAGCTAGGATCAATATATTCTCCCCAA 
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TTGTAAACATATAACTTCTCATTTGAGTGGGTATTATCTTTTGATTTAAAC 
CAATGACTAATCATTAAACATAAAATACCAATAACTATAGCAGAAATGATA 
AGTTGAAAAAATCGCTTCATTCATTAACACCTCGCTTCATCTTTTTCTGAC 
GATTAACGATATGTTGAATAAAGTAATATCCAAATACTCCAAACATAATAG 
5 CAATAAATATTATTGTAGAAATGGCATTAATCTCCATACTTATACCTTTTC 
GAGCCATAGCATAAACTTCTACAGACAATACACTAAATCCGTTTCCAGTAA 
CAAAGAAGCTTACCGTAAAATCATCTAGAGAATATGTTAATGCCATAAAGA 
ATCCTCCTATAATCGAGGGCATAATATTAGGTATAATAATGCTAGTTAATA 
ATTGTGATTCATTAGCACCTAAATCTCTTGCTGCATTTAACATGTTATCAT 

1 0 TCATTTCATATAATTGGGGTAAGACG ATGATAACAACAATTGGAATACAGA 
ACGCTATATGAGATGCTAGCACTGTTGAAAATCCTAAACCTAAACCTGTGA 
AATGTCCTATTGCAGTGAACATAATTAAAAAAGAAGCACCAATGACTACGT 
CTGAAGATACCATTAATACATTATTCATCGTTAATAGCGTAACCTTAAACC 
GTTTGTTGCGTAAGTAATATAAAGCGATCGCACCAAATGTACCTATAACTG 

1 5 TAGCAATCGATGCAGATAAAAGTGCCACTGC AACTGTATTAAAAATCACCG 
ACATCAACCGATCATTATGAAAAAGGGAGTGGTAATGTTCTAAAGTAAAAT 
GTTCGAAGTGACTCATATTACCCGCAGAATTAAAAGAATAAATCATTAAAA 
AGAAAATGGGAATATAAAGCACACTTATTAATAGAGCTATATATAATTTTC 
CGTACCATTTCACACTAATCACCCCTTCCCATTAGATGATTTTGATTTTGT 

20 AATAATGAGTAAAAAGGCCATAAAAATAATGAGAAAAAGTGCTATAGTGGA 
ACCTAATCCATAATTTTG 



Sequence 3459 

25 step . 1003e03 . cons . ok 

TTTTTGTAATTTTACTGCTTCAATCATTTTCACAATAAGTTGTTTAAATTC 
ATTGATATTAAGGTGTAATTGAGTGTCATCTTTAGCGTGGTTAATAGTTAA 
TAAAGCTTTTTCAAAAGATTTTAATGCGCCTTTTAAATTCCCACGACGATA 
GTGATAATTAGCTGTTGCAAACAAAATGAGACTTACTACTGCATCATGTTT 

30 TGAAAAAGTATTTTGTGATTTCCAAGCATCTTCTAAAATATCATGACATAG 
GAAGTAATGTTGATGCTTATGAAATTGATAATAATAATCCATTAATGATTG 
ATCCATTAGATTTCACCCATAATTTAAAAGTTGTATGTATTCATGCTATAA 
TAATTTAGTGAAATATGCATTAACGATGCAATTTGATATAGAGGTAGATGT 
AATGTATGAAGTAAAACTCGATGCATTTAATGGTCCATTAGACTTATTATT 

35 GCATCTAATTCAAAAATATGAAATTGATATTTATGATATCCCTATGAAAGC 
CTTAACTGAACAGTACATGCAATATGTTCATGCGATGAATCAGCTAGAAAT 
TAATGTTGCTAGTGAATATTTAGTTATGGCATCAGAATTACTAATGATTAA 
AAGTAAATTCAACGATGAATATAACCAAAAATCCAACACTAAAAACATGCT 
GTGCGGCTTCTGCACTATGAACTGTTGTTTCACCATTAACTGTTGTAGTAA 

40 AATTAGCATCCATCGCTCCAGTAATTGAAAATACCAAGAAAACTAAAAACA 
ATAGACTAGAAAAACTTTGTAAAATGATTCCTATCTIATGATGTTATATTAA 
TTGCTTTATTCATCATAATTCTCCTTTCATTTTCTACGTATATCCTTACAA 
TCAGGTTTATGTTCTTTAATATAATCTTTAGGATTTATATCTCGTTTATAG 
TAGACTTTCTTTCCCTTTATTTCATATGCAACATAAAATAGTTGCTCTTTA 

45 TTTTTGAATAATTTATAACTAATAATAATCAATTTTCCATATAAAGAAGTA 
GATTTTTCCGCTTCACCAGAAATAATTAAAGGTGTTCTTGCTTCGTCGATC 
AATATGGAATCGACCTCATCAATAATTGCAAAATGTAGAGGACGCATTACT 
CTCTCTTCAGCATAGTTCACCATATTATCTCTAAGATAATCAAACCCAAGT 
TCATTATTCGTACTATAAGTGATATCTTGTGCGTAAGCCTCACGTTTTTCT 

50 TCAGTTGACTTACTATTTAAGTTCAAACCTACAGTTAAGCCAAGATAGTTA 
TATAGTTCAGCCATTTCTTCACTTTGTGAACTTGATAGATATTCATTGACT 
GTAATAACATGTACACCTCTACCAGCT/^GCATTCAAATACGTCGGCATG 
GTTGCAGTCAATGTTTTCCCTTCACCTGTTCTCATTTCTGCAATATCACCT 
TTATGTATAGCAATACCACCCATTACTTGTACTTTATAAGGAATCATATTA 

55 AATACTCTCTTTGACCCCTCACGTACAAGTGCATAAGCTTCAGGTAAAATT 
TTATCTAAATAATCATTTTGTTTTTTTACATCTTGATGATTTAGTGATCAC 
CACAATCCTTTTATCATGTAACTAACCTAGTAAATTTAAGTTTCAACTTTT 
ACTTATAGTGTTTGACTTTAATTATAAATCAAAATTCGTCATAATAAGACG 
TGCTAAATTTACAATAGCTAAAACATGATAGAGGTGACTTTTATGTCTACG 
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AATAATGAGATTGAATTTAAACAAATACTAGATCAAGATACTTACTCAAAA 
ATCTATGAACACTATTTCAAAAATCAATCACCTTTTAAGCAAACTAATTTC 
TATATCGACACAGAGAATTTTAAATTAAAACAGCATCATGCTGCTTTGCGT 
ATAAGGGTAAAAGATTATATGTTTGAAATGACTTOTAAAAGTCCCTGCTGAA 
5 GTTGGATTGACAGAATATAATCACTCAGTAAATATAGAACCTGAACTTGAT 
ATGTCACTTCAACTTTCTCAATTACCCAACGATATTAGAAATATTTTAGAA 
CAGGACTTTAATATTTTAGAAAATGAGCTTAAAGTACTAGGAAACTTAACT 
ACCTATCGTTTAGAAACCGATTATCAAAATGAATTACTAGTATTAGATAAG 
AGTGAATATCTCGGCAAAACTGATTATGAATTAGAGTTTGAAGTTCATTCT 

1 0 TATG ATG AAGGATATTC AAAATTTAAAACTTTACTTCAACATTTTAATCTT 
CAACATCAAAAACCCTTGAATAAAGTGCAACGTTTTTTTCAAGAAAAACAA 
AATGCAAGTGATAAAGAGTAACAGTTGTCGATAAGGCAAATTACTTTTAAA 
TGTAAAAATTCATGTTATATTAGATATATATTAATGGAACACGGTGATATT 
ATGTCTAAAACACCATATGAGTTGATTGGTCAAAAAGCCTTGTATCAAATG 

1 5 ATTGATCATTTCTATCAACTTGTCGAGAAAGATTCTCGTATCAATC ATTTA 
TTCCCAGGCGATTTCAAGGAAACCAGTCGAAAGCAGAAGCAATTTTTGACA 
CAGTTTCTTGGAGGTCCTGACTTATATACCCAAGAACATGGTCATCCCATG 
TTAAAACGAAGACATATGGAATTTACAATTAGCGAGTATGAACGTGATGCA 
TGGCTTGAGAACATGCATACTGCTATTCAACACGCCAAACTTCCTGCGGGT 

20 GTAGGCGATTACTTGTTTGAGCGATTAAGACTTACTGCAAATCACATGGTA 
AATTCCTAAATTTAATTGTAGGTGAAAATACATGGCTGAAGAATTAAGAAT 
CATGGAGAATAAGAGTCGTGAAGATACTAATCTATCACCTGTTAGCAAAAT 
AGAAATCTATTCTTTTTTTGATCCTTTTAGCAAAGATTGTTTTAAATTATC 
TGCAATCTTATCAAAATTAAGAATTGAATATAATAAATATATAAAGGTAAG 

25 ACATATTTTAAACCCTTCTTTAAAGGTATTAACTAAGTGTCAAGCTCAAAG 
TACTTCAGATTTTGACAATATTGCACTTGCCTATAAAGCCGCTGAACTTCA 
AGGTCGTATCAGAGCAGAAAGATTTATACATTTAATGCAAAATGAAATCAT 
TCCAAAACGTGATATTATTACCGAAGATATGATTTCTGATTGTATTAATAA 
TGCCGGCATTGACTATCAAGTTTTTAAAGAAGACTTGCAAAAGGACAAGTT 

30 GACTGACAGCTTGAAAGTTGATCTTCACATTGC 



Sequence 3 460 

step . 1003eO4 . cons .ok 

35 TATATCGGCACTGAAAAGCCTATGCTAAAAAATAAAAATATCGCGCTTCTT 
TTTGAAAAAGATTCCACTAGAACACGTTGCGCATTCGAAGTTGCCGCACAT 
GATCAAGGTGCACACGTCACTTATCTTGGACCTACAGGTTCTCAAATGGGT 
AAAAAAGAAACTGCTAAAGATACAGCACGTGTACTTGGTGGTATGTATGAT 
GGTATTGAGTACCGAGGTTTCTCTCAACGTACTGTAGAAACATTAGCGCAA 

40 TATTCAGGTGTTCCGGTATGGAATGGATTAACCGATGAAGATCACCCTACA 
CAAGTGCTTGCTGACTTTTTAACTGCTAAAGAAGTATTGAAAAAAGAGTAT 
GCTGATATCAACTTTACTTATGTTGGCGATGGACGTAACAATGTTGCTAAC 
GCATTAATGCAAGGTGCTGCCATTATGGGTATGAATTTCCATCTTGTTTGT 
CCTAAAGAACTCAATCCGACAGAAGAATTATTAAATCX3TTGCGAACGTATT 

45 GCGACGGAAAATGGCGGTAACATTTTAATAACAGATGATATTGATAAAGGC 
GTGAAAGATTCTGATGTTATTTATACAGATGTTTGGGTATCAATGGGCGAA 
CCTGATGAAGTATGGCAAGAACGCCTTAAACTTTTAAAACCATATCAAGTT 
AACCAAGCATTATTAGAAAAAACAGGCAATCCAAATGTTATTTTTGAACAT 
TGTTTACCTTCTTTCCACAATGCAGAAACTAAAATTGGTCAACAAATTTAT 

50 GAAAAATATGGCATTAGTGAAATGGAAGTCACTGATGATGTCTTCGAAAGC 
AAAGCTTCTGTAGTATTCCAAGAAGCTGAGAATAGAATGCATACAATTAAA 
GCGGTCATGGTAGCAACTTTAGGAGAATTCTAAATTATAGAAGGAAGTGAG 
TGAAATGGCTAAAATTGTAGTAGCTTTAGGTGGAAACGCTTTAGGAAAATC 
ACCACAAGAACAACTTGAATTAGTAAAAAATACAGCTAAATCCCTAGTAGG 

55 ATTAATTACTAAAGGTCACGAAATTGTGATTAGTCACGGTAATGGACCACA 
AGTAGGAAGTATTAACCTTGGTCTGAATTATGCAGCTGAACACGATCAAGG 
TCCTGCTTTTCCATTTGCTGAATGTGGCGCTATGAGTCAAGCCTACATCGG 
CTATCAACTTCAAGAAAGTTTACAAAATGAACTTCATTCAATGGGCATAGA 
TAAGCAAGTTGTCACACTAGTTACCCAAGTAGAAGTTGATGAAGGCGATCC 
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AGCTTTTAATAGTCCAAGTAAACCCATCGGTCTGTTCTACACTAAAGAAGA 
AGCAAATCGTATTCAACAGGAAAAAGGTTATCAATTTGTAGAAGATGCTGG 
TCGAGGTTACCGTCGCGTTGTACCATCACCACAACCAATATCTATTATCGA 
ACTGGAAAGTATTAAAACTCTAGTAGAAAATGACACACTCGTCATCGCTGC 
5 AGGTGGAGGTGGTATACCAGTCATTCGCGAACAGCATGATAGCTTTAAAGG 
TATAGATGCCGTCATCGATAAAGACAAAACAAGTGCATTATTAGGTGCTGA 
TATTCACTGTGATCAACTCATTATTTTAACAGCGATTGATTATGTTTATAT 
CAACTATCATACTGACCAACAACAAGCACTTAAAACAACAAATATAGATAC 
GCTTAAAACATATATTGAAGAAGAACAATTTGCCAAAGGCAGCATGCTACC 

1 0 TAAAATCGAATCTGCCATCTCCTTTATTGAAAATAATCCTAACGGTAGCGT 
GCTCATCACATCATTAAATCAATTAGATGCAGCACTAGAAGGTAAAATTGG 
CACACTCATTACAAAGTAACGTTTTAACGTATAAGTAGAAATTAATTTAAT 
CATATTCATACAGTGCATCCTGTTATTCCACTCCTCTCCATTTCATACAGG 
GTGCACTTCCTTAATCAATCATGTAAAATACTAACCAAATTCAATTGATAG 

15 GCTTTTATTTTAGGTTAACTGTCGAAGGAGATGAAACCGTTGGAACAAGCG 
ATCAATGATAATAAAAAGAAAAAACGTTTTAACTTTAGAATGCCAGGTGCA 
TTTATGATTCTCTTTATCCTAACAGTTGTCGCAGTTATAGCAACTTGGATA 
ATCCCCGCGGGTGCATACTCAAAACTTTCATATGAACCTTCATCCCAAGAA 
TTAAAAATTGTCAATCCTCATCATCAAGTAAAAAAAGTTCCTGGAACACAA 

20 AAGGAGCTTGATCGATTAGGAGTTAAAATCAAAATAGAACAATTTAAATCT 
GGTGCAATTAATAAACCCGTTTCAATTCCTAATACTTACGAACGTCTAAAA 
CAACATCCAGCTGGTCTTGATCAAATTACTAGTAGCATGGTTAAAGGAACC 
ATCGAAGCCGTCGATATTATGGTCTTTATACTTGTTCTAGGTGGACTGATT 
GGTATTGTTCAAGCGAGCGGGTCCTTTGAATCAGGATTGTTAGCACTTACT 

25 CAAAAAACGAAAGGCCACGAATTTATGTTGATTATGTTCGTAGCAATTTTA 
ATGATTCTGGGTGGAACACTATGTGGCATTGAAGAGGAAGCTGTAGCGTTC 
TATCCTGTACTCGTTCCAATATTTATTGCGCTTGGATATGATTCTATTGTC 
TCAGTCGGTGCAATTTTCTTAGCAAGCTCTGTGGGTAGTACATTCTCAACA 
ATCAACCCATTCTCAGTCGTCATTGCTTCTAATGCAGCAGGAACAACTTTT 

30 ACTGATGGTCTTTATTGGAGAATAGGCGCTTGTATCATCGGTGCCATATTT 
GTTATTAGTTATTTATTCTGGTATTGTAAAAAAATTAAAAAAGATCCTAAA 
TCCTCTTATTCTTATGAAGACAAAGCAGCATTTGAAAAACAGTGGTCTGTG 
CTCCATGATGACGGTTCTTCTGAGTTTACATTACGTAAAAAGATTATTCTT 
ACGCTTTTCGTCCTACCATTCCCTATTATGGTTTGGGGCGTCATGACACAA 

35 GGATGGTGGTTCCCAGTCATGGCATCTGCATTCTTGATCTTTACCATTGTC 
ATCATGTTTATTGCTGGAACAGGACAATATGGTTTAGGCGAAAAAGGCACT 
GTAGATGCATTCGTTAATGGCGCTTCAAGTTTAGTAGGTGTATCTTTAATC 
ATTGGTTTAGCTCGAGGAATCAACTTAGTATTGAATAAAGGAATGATTTCT 
GACACAATCTTGCACTTTTCATCATCTATCGTGCAACATATGAGTGGGCCT 

40 TTATTTATCATTGTTCTGCTCTTTATCTTTTTCTGTTTAGGATTTATCGTG 
CCGTCCTCATCAGGATTAGCAGTACTATCTATGCCTATCTTTGCGCCATTA 
GCTGATACA6TAGGTA 



45 Sequence 3461 

step . 1003e05 . cons . ok 

CTATTTCATGATGATCTGTCACTATAACATCTACACCTAATTCTTGTATGG 
TACTTATTTCATGATGTCCCTGTATACCATTATCTACAGTTATTATTAAGG 
AAATCCCTTCGTCATAAGCATTTTTAAATGCTAATTCATTAGGTCCATATC 

50 CTTCTGTAAATCTATTGGGAATATACCACCCCACCTGAGCGCCAAGTAGAC 
GTAAAGTAGACACTAGAATTGTAGTAGATGTAACACCATCTGCATCATAGT 
CACCATATACTAGTATTCGTTCATTTTGATCGATAGCTAATTTAATACGAT 
CAATGGCTTTTTGCATATCACTTAATAACATATAATCATGACTAATGTTAG 
TGCCTTCAAATAGATTTTGAATCTCCTGATTATCCACAATATTTTTACTCT 

55 CTAATACCTTTTTTACTATAGGTGAAAGTTTAAATTGATCACTGCTACTCT 
CACTTATAAAGGTATTCGGTTGAGTTATATCCCAATTATATTTTGATTTAA 
TCATTATAATCGTCCTCATTTCCACAGGTCATTATACCGAAATTTTATTGA 
AATAAAAAGTACTTGCATTTTTTAAAAGAATATATCATTTCAATTCTTCAC 
AAAACTTTTATCATTCAATTTGTTCATTAACTGCTATCTTTATGATTAAAC 
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GCTAATATATAAAATCAAGAGGCAGGGGCATAATTCCTAGCAAAATAGACA 
GTAAATGAGTTTTTATAAATTCATTTACTGGCTTCTTTATTTACAATACCT 
CGTATTGTTGGCTCGCTTTCITAGGGGACAGCTTCAGCCTGTAGTCTTCAA 
CTTGTCCTGCTCCCTCAATAGTCTCGCCAAAATACTTTGTGTTTATATGTA 
5 ATTTCACATTGAAATACTTTAAAAAATAAGGTCTTTCGTATAATTTAATAA 
ATAGCAATAAACTAAATTAACGAGGTGCCTTATGTATAAAGATTATAATAT 
GACTTAACTTACTCTACCAATGGAAATTTCAGTTCTTATCCCCACACATGA 
TATTTCACGACATGTAAATGATATTGTAGAAATAATTCCAGATACAGAATT 
CGATGAATTCAGACATCATCGTGGCTTAATATAAAAAATAATAGCTCAACG 

1 0 AGCTGAAAATAATCAAAAAATTATAAAAAAG ACAATTTCTATATTATTTCA 
ATAGAAATTGTCTTTAATTACTTATCTTGGACCTTTTTGTTCCAGCCTCTT 
GTAAATTTTTAAAATTATTAAAATGTGAAGAAGTTAAACTAAGATTTTTTC 
ATCGTTAGATTTCTTCTCTTTGTGTACTACTAATTTATTATTTTTTGTCTT 
TTTAAACTGTCGTTTCTTAAGCATGCCCCATAATGGTACAGCAATGAAAAT 

1 5 TGACGAGAATACACCAGATAATAATCC AATTAGTAATGCTAAAG AGAAATT 
AAATATTGTTGGTGCACCTAATATTAATATTGCAACTACAACTACAACTAC 
AGTCAACACTGTATTAATAGAACGTGTCATAGTTTGTCTAATAGAGCGGTT 
GACTATATCATCAATTTGATCAGTATGCGTAATTACTTTAACTTTATGCAG 
ATTTTCTCGAACACGGTCGAAAGTTACGATTGTATCATTGATTGAATAACC 

20 AACGATAGTTAATACTGCTGCAATAAATGTTAAATCTACTTCTAATCTAAA 
CAAACTAAAGATTGCAATGATGATAAATACGTCATGTAATAATGCAAGTAC 
AGAAGATAGACCCATACGCCATTCAAATCTTAGTGAAACATAGATAATAAT 
GCCTATAGAAGCAAGTATTAATGCAGTTACAGCATTTTTAGCTAGCTCTTG 
GCCTATGAGAGGTGAAACGGTATTAATTTGTGGATTATCTCCAAATTCAGA 

25 TTTCACCTTAGCACTTAATTTATTGTCTTCCTCACGTGATAAATCTTTTTT 
AAACTGAACTGTGGCATTTTTATTTCCACTACCATTAATCTGTATTTGATC 
TGGTTCCAATCCAACTGATTTTACAGTTTTCTCAACCTGTGCTTGTGTTAT 
AGCATTTTTAGATTGAATATCTGCTCTTGTTCCGGATGAGAAATCAATACC 
TAAGTTTAATTTAAATATTGAAATGATAATCAAACCAATAATTACAATTAA 

30 AATACTAAGTGAAATAAGTGGCTTAGCTAATTTAACAAAGTTTAACCTTTC 
ATATGATGTTTTTAAATCATGTACATCTTTACCTTCATTAATATCATGTCT 
ATCCTTCTTCTTAACACCAAATAACCAGTATTGTTTTTTAAAGAAGTTTGA 
AGATACCAGTAATGATAACAACCCTCTTGATAAGAATACTGCGGTTACAAA 
TATCATTAAAATACCTAAGAGTAACATGGTTGCGAAGCCTTTGACTGAACT 

35 TTCTCCAAAGAAGAAAAGCACAGCTGCAGCGATGACAGTTGTTAAGTTGGA 
ATCAAATATAGTTAAGAATGAACTTTTATTTGCTTTTGAATACGCTTGTTT 
AAGCGTGCGTCCAATTCTTAGTTCATCTTTAATACGTTCATACATTATGAT 
ATTGGCATCGACAGCCATACCTACACCTAAAACTAATGCCGCCAATCCAGG 
TAGAGTTAATACACCTGATATGAAATTGAATGCGACTAAAGTTAAATAAAT 

40 ATAAGTGGTTAAGGCAATGATTGCAACTAAACCAGGCAAACGATAGAAACC 
AAGCATAAATAAATAAATTAATGCTATACCTACAATTGATGCAAACATGGT 
CTTATCAAGAGCATCTTGACCAAATTGTGCACCAACAGAGTTAGAGTAAAT 
TTCTTTTAAATCAACTGGTAATGAGCCGGCATTTAATAACTCAGCTATTTG 
TTTCGCTTCTTCAACACCTTTTTTCCCATTGAAGCCACCTGAAATTTCAAC 

45 ACTACTAGAATTAATAGGTTGGTCTACACTCGCTGCAGATATAAATTTAGG 
CTTTTTACCTTCTTGTTGCTTTTTAGCTTCTTTCTTGTAACTATCGCCTTT 
TTCGAAATCTAACCAAACTACCATGACATTGTCACGTTTTTTAGAAATCTT 
TTCAGTTACTTTCTTAAATTTATCTTTACTTTTTACTTT7VAATGTAACTGT 
TGGTTGATTAGTTTCTTGTTTAAATTCTTGTTTAGCAGAGCCTTGTTTAAT 

50 GTCTGAACCAGACATTAAAACATGATCTTCAGCATCTCTAATTGTTAAATT 
AGCTTGTGTCGATAATAATTTACGCGCTTGTGCTTGATCCTTGATACCTGC 
TAATTGTACACGAATTCGATTTGGATCTTCGATTTGTATTTTCGGTTCTGA 
TACACCTAGAACATTTACACGATTTTCTAATGTTTGAGATGTAGCTTGAAG 
TGCTTTTTTATCAATTTTATCTCCTTTATTTAAAGGATCTACTTGGAAGAG 

55 GACTTCAAAACCACCTTGCAAATCTAGACCTAAATTAACATTTTTAACTAC 
ATTCTTATACGTCATACCCTTTAGAGCACAGTGGCGATGATAT 
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ATTTCTTAAAATGTTGTAAGTCTTATGTTTGCTTTTATTTCACGAATCTTT 
TTATTTATTTCAAACAATTGTTTACATGTGATTTTAAATCATCTCTTCTAA 
ATACAATTAAGAATACTATACCTGTATCTGACATTCGTCAATAGTTAAATG 
5 AAACTCTTTTCACATCGTCATGTATCATTCATTATTTTAGTAGACTGATAG 
CATAATGTAGCAGTTTTATTTATTTAATTAAAAATATTAAATGGAAATAGA 
ACGTAAGGTGACATTACAAGCAGTTAATTATAATAATGTTGAAATTCTCTC 
GCCTCTCATCCGAGAAAAATTGTATGTATATCGTCACAATGTATCTACTAT 
TACACAGAACAAAATATTTAGTTTC7UVCAAGTTGTAGAAGCGGGACACTAA 

1 0 TTATTTAAAAAGTGTTGCATATAAGCACATCTAAATATGACGTAATTGTTG 
CAAATTTAAAATAACTTGAAATGTGATAAT6ACTTTGCTATATTAAAATTA 
AGTTAATAAAGTATTGTTCGTAGGACAAGTAACATAGATAGTTCGATTTCA 
GAGAGCTTGTGGTAAGTGAGAACAAGTAATCGACATTCATGTGAATCTACC 
TACTATATGTGAACAATCGGTAATAACCGTTATTTTAGTTAAGCGCAATTT 

1 5 GAGGTAAATGTTTTTAATTTACTTGAAATTGTTAAATAGGGTGGC AACGCG 
TAGACCACGTCCCTTGTCTGGGATGTGGTCTTTTTTTATTGTTTTATCACA 
CGAAGTCATCCATAAAATTGAATATATTTTATTTGGGAAAGGATGAAGGTT 
ACATGTTAGACATTCGTTTATTTAGAAATGAACCTGAGAAAGTGAAGAGCA 
AAATTGAATTAAGAGGCGACGATCCTAAAGTTGTCGACCAAGTTTTAGAAT 

20 TAGATGAACAACGCCGTGAATTAATCAGTAAAACTGAAGAGATGAAGGCGA 
AAAGAAATAAAGTGAGCGAAGAAATAGCTCAAAAGAAACGTAATAAAGAAG 
ACGCTGATGATGTCATTGCTGAGATGCGTCATTTAGGTGATGAAATTAAAG 
ATATCGATAATCAACTTAATGAAGTAGATAATAAAATTAGAGATATCTTAA 
TTCGTATTCCTAACTTAATTAATGAAGATGTACCTCAAGGTGATTCTGATG 

25 AAGAAAACGTTGAAGTTAAAAAATGGGGTACGCCACGTGATTTTGAATTTG 
7VACCTAAAGCGCACTGGGATTTAGTTGAAGAATTAAAAATGGCTGACTTTG 
AACGTGCTGCTAAAGTATCTGGTGCTCGTTTCGTATACTTAACTAAAGATG 
GCGCATTACTTGAACGTGCTTTAATGAATTACATGTTGACAAAACATACAA 
CGCAACATGGTTATACTGAAATGATGACACCTCAATTAGTGAATGCTGATA 

30 CGATGTTTGGAACAGGTCAATTACCTAAATTTGAAGAAGATTTATTTAAAG 
TTGAAAAAGAAGGCTTATATACGATTCCAACTGCAGAAGTACCTTTAACAA 
ACTTCTATAGAGATGAAATTATTCAACCAGGTGTACTACCTGAATTATTTA 
CAGCTCAAACTGCATGTTTCCGTAGTGAAGCAGGATCAGCTGGTAGAGATA 
CTAGAGGGTTAATTCGTTTACATCAATTTGATAAAGTTGAAATGGTTCGTA 

35 TTGTACAACCTGAAGATTCTTGGGATGCTTTAGAAGAAATGACACAAAATG 
CTGAAGCTATTCTTGAAGAATTAGGTTTACCATACCGTCGTGTTATCTTAT 
GTACTGGCGATATTGGTTTCAGTGCTAGTAAAACATATGATTTAGAAGTTT 
GGTTACCAAGTTACAATGATTATAAAGAAATCAGTTCTTGCTCTAACTGTA 
CTGATTTCCAAGCACGTCGCGCAAATATCAGATTCAAACGTGATGCTGCTT 

40 CTAAACCAGAATTAGTACACACATTAAATGGTAGTGGTTTAGCAGTAGGTC 
GTACATTTGCAGCCATCGTTGAAAACTATCAAAACGAAGATGGTACATTAA 
CAATTCCTGAAGCATTAGTACCATTTATGGGTGGCAAAACTAAAATTGAAA 
AACCAATCAAATAATCATATAAACTTAACTAGATGATTACAATATTAAAAA 
GTTATTTAACTGTCGATAAGATTCTATAAACTTATCGACAGTTTTTTAATA 

45 TATAAAGGCGAAATTGGTATTTTTATATTCAATTAGGGATGGATGTCTAAA 
ATAGTTTAAAATGATTTTCACGATTTAAAATACTGGTATAAAAGCGATTTT 
ATAGATACGGTTGTCTGACGTCACACAAACGAATGTCTAAAAAATAGTGAA 
TATTTATTGTCATAGATTGAGTTATAAAATCATGAAAATAGTACATGAAGT 
AGCAAGTATTTATCTTTTAGATAACGCCTGTAAAAATATAAAATAGTGGAA 

50 ATTGATTTTTTAAACGTTATCATTCATATTCAAAATATTTAACTTAGTTTG 
TCAAACAACGTTTATTTTTATATAACCATAAAATAATATCATAAAGAAGCT 
ATATGAAATTCGAAGTATAATGCCAATGTTTCATATAGCTCCTATTCTAAA 
TATTAAAATTTAATAATTAAATGTATCAATCGTCCTATTCATTTGATGCAT 
AGCCTGAGAGTGGGTAATCATCTGAGTAATCATTGTATTTGTATGTTTTTC 

55 CCCAAGATTTCACGGACCATTCTGGTGACTTCACTCTATGTGTATCATTAT 
GCCATGAGCTTGGTTGTGCATGGTCATGATCTAATAAAATATAATCTAAGT 
GTTGAGGTTCTAATTTAGGATAATTATATTTCGCAATACTATTACTGCTAG 
TATCCCAACTATATGCATTACCATCGAATTGAGTAGGTAATGAAACATTTA 
AGTTATTTGCCATTTGTTGATATTCATCTGAATCTTTAATGACATTTAAGT 
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CACCACCGATATAGACGGGTTCATTTTTAGGGATATTCTTGTCTTTGATAA 
ACTGTTTAATTTCACTCATTTGACTCTGTCTAATGTCTTTATCTTTTCCTT 
TAAAGCATGTTGGATCTTCAGCTTGTAGATGTGTTCCGATAAT 

5 

Sequence 3463 

step. 1003 e08 .cons. ok 

ATGCGTTTATCAAACGTCTCATTAAACAATTTGGTAAACCTCAAAAGGTAA 
TTACAGATCAGGCACCTTCAACGAAGGTAGCAATGGCTAAAGTAATTAAAG 

1 0 CTTTTAAACTTAAACCTGACTGTTATTGTAC ATCGAAATATCTG AATAACC 
TCATTGAGCAAGATCACTGTCATATTAAAATAAGAAAGACAAGGTATCAAA 
ATATCAATACAGCAAAGAATACTTTAAAAGGTATTGAATGTATTTACGCTC 
TATATAAAAAGAACCGCAGGTCTCTTCAGATCTACGGATTTTCGCCATGCC 
ACGAAATTAGCATCATGCTAGCAAGTTAAGCGAACACTGACATGATAAATT 

1 5 AGTGGTTAGCTATATTTTTTTACTTTGCAACAGAACCTATAAAAAACATGG 
GCTTAAAGGGTTAATTTATATAGTATTACTGCAAACATTGATTTGGGTCAA 
AAATTAGGGGTATTAAAAAAATGAATGAATCATAATTTTATCAAGCTGATT 
GGAGAGGTTAAAATGCATTATATAAAACTTTAGAGTATCATTAATATCTTC 
ACCTCAGATATACAAAAAACTACTAGCATATAAAAGATATATCTTTTATAT 

20 GCTAGTAGTTTAAAAGCTTAAACAATCAATCGTCCATTTTATTAATATATT 
CTTTTTCTTTATAGTATAATTTTTCTCGTCTTTCTTCTCGTAGTTTTGCAC 
GTTTTTTTAATTCTTCTGGCGTATTAAACTCTTTTTGGCTTAATAAGATAG 
TAGATTGTCTAGATGCATCGTCTTTCCTGTACATATATGCACCATTACGGT 
AGGCCGCTCTACTTATCAAATGCATTCCAACCGGAGATGTAAGATTGATAA 

25 AAACTAGTGATAATAATAATCTGACACTGAAAAAACCTGAATTCACAATAA 
AATAGATCAGTACACCAACTACAGTTAGTAATACTGACAATGTAGAACTTT 
TCGTTGAGGCGTGACTTCTTAGAAAGACATCTTGAAATTTTACAATCCCTA 
TTGCACTAATTAATGCAATAATACTTCCTAAGAAAATAAGTATCGAAGCAA 
TAAGACTAACGATGTCTTTTATGATTTCCATTGAAGACACGTCCTTCCCCG 

30 ATGAATCTTGAAATTGAGACCGAACTGACAAACGAAATAATTGCAATTAAC 
ATGATTGAATCTAAATAAGACACTGAGTTAAAAATAACGCTCATCACACCA 
ACAATAGACATAACAACAGCACTCGAGGCATCAAATGATACAACTCTATCA 
GCAGTAGTGGGACCTTTAATTAATCTGACTAGACAAACAAGTAGTGCCATA 
CCAAAAATCACTAATGCACTTATAATAAATATTTGAGTGAACATTTCAATC 

35 ATCGTGTCACCTCCAAAATTAAATCCTCATACTGCTTAATACTTTTTAGAA 
GATTTTCTTTATCTTTTTCTGACACATCAATACTGTGAATAAAAAATTTAT 
TAGTATTTTTAGAAATTCGAATAACTGTCGATCCTGGAGTAATAATAATTA 
AAATCGTTAAAAAAGTAATAGCCCAATTACTTTTTAATGAAGTTTCATATG 
TGAGTAAACCTGGATTAACTTCATTCGTCTTAAATAAGATGTAATTTATGG 

40 TACTTATACTAGAAGTAATAAGCTGGTATAGGTATACAGCTAAAAATTTAA 
TAGCCACCCATATCTTTTTCAAATAAAATTCTTCACCAAAAAAGCGATGCA 
GAATATAAATCACAATTAAACCAATTAAAAATCCAGCAAAGAAGGTTGTAA 
ATTTAAATTCATCTTCATCTTGAAAGAGTACCCAAAGGAACGCGATAACAA 
TATTTAATACAACTTGCTTCACTTATGATTAACCTCCTTCAAATGTGTATT 

45 TACATTCTTTTGAAAGACATCTTCTTTCATATTAAGATTTGTTGCATCCTC 
TGTTACTTTCAGAACAACAGGGGCTGCAATACCCATCGCTAATACCACTAC 
AACTAAAACACTAAGTAAACCTTTACGATAAATAGGTAGTGGTCTAAATTG 
TACTTGTTCTCCGTCAGCATCACCAAAATACATTATAAACATCACTCT 

50 

Sequence 3464 

step. 1003e09 .cons. ok 

GATATTGAAAAATAGGCAATAAAGTTATTAAAAACATAATAACTAACGTAG 
CACTATGTTTAGACATTATTTTTAAAACCAACACCCCTCCTTTCATATAAA 
55 AATTTTGCTCCGTTTTTCATATTAGACATTATCACTCATCTTAACAATAAC 
TATTCTAATTTTTTCGAATTAGACACAAATATAAAAATAGAAATTTTATCT 
AGTAAATTAAATACAAGTATATATAAATCATAAAGTTAATAAATCAATTAA 
GTATGTAGCAAAATGGCAATTTGTAATATAAAATAAAGTTTATTCATTTTA 
GAATCACCTTACTAAGTGATCAACAAAATTTCGAAGATTGAATAAGTTGTA 
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TAAAGTTTATTATAATTAACCATACTATTAAATCATAAGTAATCTGCATAT 
GCCTTAAAGATAAATTATATATTTAATGATTAATGTACAATCTAAAGTTAT 
TAGTAGCTGGATTGATAATATAGCAAAATTCATCTACTTTTTCTTATACCT 
AACTATACAATAGGAATGTGATTCAACTTTATATATCATATTTAGGCCAAG 
5 AGAGTATGATATTTAAAAATTATGTCCGAAGTTTAAGAAGCTTGATGCCAC 
AAATTTAGAAGTTTAAAATTTATTTAATTAACTTTAGTATCATCCTTATTC 
TAGTGGACGCTGAAAGGAACTATTAATTATGATTACAGCTTATAAACACGC 
ACTTCATCATCCTTCACAAATTATTGAAACTGAAATCAATCATAGTGCTTC 
ATGGATAAATGTAGTAGAACCAGATAGAGAAGAAATTGAAGGTTTGATGGA 

1 0 ATTTT AC AAT AT ACCTG AGGATTTTATTCGAGATCCGTTAGACTCTGAAGA 
GAGTGCACGTATTGAATATGACGAAGATACGGGATACTCGCTAATTATTAT 
TGACTTACCTATCGTCAATTCTACCAATCGGCGTGTCTTGTCGTTTGTGAC 
CATCCCATTAGGCATTATTATTGGAAATGGTATCGTTATGACTGTTTGCGA 
TGCTGAAAATGAATTTTTAGAAAACTTTGCTAAACAAGAAGATATTAACTT 

1 5 GAAATTTCAC AGTCGATTTGCACTTG AAATACTAACAACAATAGCAAATC A 
CTACAATAGAAATTTAAGATTGCTTAATAAATCTAGGATTCGTATTGAAAG 
AGAACTCAAAAATAACATTACTAACAAGCAGCTTTTTAAACTAATGGAAGT 
AGAAAAAAGTTTAGTATACTTCTTAGCTGCACTAAAAGGTAACGACACAAT 
TATTAAAAAACTCTTTCGTCTTCCTGCAATCAAACGTTTTGAAGAGGATGA 

20 AGAGCTACTTGAAGACTTGGTGATAGAAAACAACCAAGCTATTGAAACGAC 
TGAATTACATCAACGAATACTTGAGAGTATCACTTCTTCGTATGCTTCATT 
ATTATCTAATGATATGAATAACATCATGAAAACATTAACGTTGTTTACGGT 
TCTTTTGACCCTTCCTACACTCGTCTTTAGTTTCTTTGGTATGAATGTACC 
CTTACCAATTGATGATCATAGTTACGTGTCTTGGATTATTGTTGTGGGAAT 

25 TTCACTTATTCTAGTAGCTATCGTTAGTATCTTTTTATGGAAAAAACAAAA 
ATTATAACAACAAATACCCTTAGTTATAGATTAGCACTCAATCTAACTAAG 
GGTATTATTATATGTTTAAAATTGTGTCAAAATTTTAAGTCTCTACTATGA 
TTCAACAGCTGTCTTATTTGTTTTTCTTCGAGAAAGTACTAAAGTGACGAC 
AACTGAAACAACAAATGCAATTAGCATTCCTATTAAGTAATGTAACCAACC 

30 AGCGTGTGTAGGATTTATAGATATAAATCCAGGTAATCCAGCTGTACCAAG 
CGCTATCGCTTTTACTTTGAAGAATGAAATATAAGCCGCACCTATACCTGA 
TCCTGCAACAGCACCTATAAATGGATATCTCAATTTAAGATTGACACCAAA 
CATTGCTGGTTCTGTAATTCCTAGTAAAGCTGAAATACCCGCCGCGGAAGC 
AACACCTTTTAATTTTTTATTTTGCTTAATGATAAAGAATGCAGCTAAAGC 

35 TGCACCACCTTGTGCAATATTTGACATCGTTGCGATTGGGAAGATAAATGA 
ACCACCTGTTTTAGTCGCATCAGCAATTAATGTCGTTTCAACTGCAATAAA 
GCTATGGTGCATTCCTGTAATGACGATTGGCGCATATAATAAACCAAATAT 
AAGTCCACCAATAGCTCCACCAAATTCATATAACCAAGTCAATCCATCAGA 
TAACCAATAACCTAATTGACGAGTGACAGGCCCTACAAATAAAAATGTTAT 

40 AAATGCTGTAATAAAAATTGATAATAATGGCGTCAATAAATTATCTAACAC 
AGTTGGAATAACTTTACGTAACCATTTTTCAATCGTAGCTAAAATATATGT 
TGCTACGAGCATAGGTAATACTTGTCCCTGATAACCTACTTCATTAATATG 
TAGCCCAAAGACATCCCAGTGTGGAATAGCTTTTCCTTCTTCTAAAGCTTT 
TGGAAAATCATATGCACTCATCAATCCAGGATGAACAAGTATCATACCTAG 

45 AGCTGCACCTAAATAAGGATTACCACCAAATCGCTTAGCTGCACTAAAACC 
AATAAGTATTGGTAATAATGTAAAAGGTGCATTTGCAAATATATTAATCAT 
ATCAGCCAAGCCAGAAAATTGACTATGTACATCTATGATAGATTTACCATC 
ATAAAACAAATCTTTAGCTGTGAAAATATTATTTAATCCCATTAACAAACC 
ACCAGCTACAATAGCCGGAATAATAGGAACAAAGATATCAGATAACATTTT 

50 CACAAATTTTTGGAATGGATTCATATGCTTAGATGATTTATCTTTAACTTC 
TGAAGT 



Sequence 3465 
55 step . 1003elO . cons . ok 

CTGATGCAAAGCTCATTGAACACTCACCCTTCCTAACCCTTTTTTCTATTC 
TTATTTATAGTATCGCCAATGAGGTGTAAAGTTATTTTCTAATTCTAAACA 
CACACTTAAGCTATCATATCATAACTACGAATTCGCTTATAATTATTTCTT 
TTTATCACTAGGAGT/^GCGAATTGTACTTGTTAATTCTAAGGCAAGTTC 
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ATAAATCATTTTTGATAATACTTTTGTGTTATGTCTGACATAGTGTTCATT 
AGATATTTCAACTAAATTAGATGCCGTTAAAACTCTAATTCCACTATCTTT 
TAATTGTTCTTTATGTACTGCTACTGGTTTCGAATTCTTTTCTTCATATCG 
TTGTAAAACATCTTTACTATAGGATTCTGAGCTACATATGACAAAATCAAT 
5 AAATGGTTCACCAACTTGTCGAGTAAGTGCATCAATATGCTCTTTGACATC 
ATAATTATCAGTCTCGCCTGGTTGTGTCATAACATTAGATACATAAAGTTT 
TGGAGCAGATGTACGTAATAATGCTTCTGAAATACCTTTGACACATAAATT 
TGATATAACACTTGTATATAATG7VACCTGGTCCTAAGACAATTAAATCTGC 
TTGTTCTAAAGCTTCTATCGCTTCATTCATTGGTTCAACATCACTTGGTTC 
1 0 TAAAAAC ACACGATCTATTTTTTTATGTGTTTTAGGTATATTAGTTTCTCC 
ATGTACAATTTCACCGTCTTCCATCACCGCGTTGAGTTGCACACTTGCGTT 
TGTTGAAGGGATGACTTGGCCTTTAATATTTAAAACTTTGCTTAACTCTTT 
AATAGCGTGTCCAAAATCATTAGTAATGTTAGTCATTCCAGCAATAACTAA 
ATTACCTAATGAATGCCCATCTACTTGATTTTCACCAAAACGGTACTGGAA 

1 5 caattgagttaatatcgattctgagtcacttaaagcagcaatgacattacg 
aatatcaccaggcgctggaatatcx:atgacatctctaattttccccgtgct 
cccaccattgtccgctacagtaacaatggcagtaatgtctattggaaattc 
tctaaggcctctagcaaggacagaaagtccagtgccaccaccaataagtac 

TACATTCATTTGTTTCATTATCTCTCTCCACTTTCAATATGCGCATCTCTA 

20 TGATGCACATAAACATTATATTCAAAAATCTCGTTAAGATATTCAGCTAAA 
CGTTTAGCTAATGCGACTGATCGATGTTGTCCACCCGTACAACCTATAGCA 
ATAACCAATTGCGATTTACCTTCTTTTTTGTAGCCAGGAATCATAAATTTT 
AATAAATCTGTTAATTTATCAAAAAATATTTGGGTTTCTTTCCACTTCATA 
ACGTAATTGTACACTGGCTCATCTAAACCAGTAAATGGACGCAATTCCTCT 

25 ACATAGTAGGGATTAGGTAGAAATCTGACATCAAAAACTAAATCAGCATCC 
ATTTGTATACCATGCTTGAAACCGAAACTTGTCACGTTGATTGTAAATGTT 
TCAAAGTTTTCATCTAAATAAAACTTTGAAATGCGTTGCTTTAATTCTTTA 
GGTTTTAATTTTGTTGTATCAATCACGTAATTAGCGATACTTCGGATTTCT 
GATAG ATGTTCACGTTCCTC ATTTATTGC ATC TATTAATG ATCTTTGTCC T 

30 TGTTCATTTAGTGGGTGCGCTCTTCTTGATTCTTTATAACGTGAAATAATT 
TTTTCAGTTTTAGCTTCTAAAAACATAACATCTAAAATCACGTCATTACGA 
CTTTTAATAATATCAATTTCTTT/^CTAGAGATTTAAATAATTCCTTACCT 
CTTAAATCTATTGCAATTGCTACTTTTTGCAATGAAGGTCAATGTGTATTA 
CTGGTATGCAATTTGGCAATGAGGTAGCTATGCATGTGAGCAGAGGTGCAG 

35 TCTTCGGTATGATAGGAGTTTTATGTAGTATACTAGCAACCTGGGGATTAC 
TACAAGCTACACATATGTGGCTATTGAGTATTATTGGTGGTTTTGTAGCTT 
GGTTTGTAAGTGCATTGATTATTTTTGAAATTGTGGAATTCATAGCACATA 
AAAGAAGGGATAAACATGGCTGGAAGACCAAAAGATCCAACAATCAATAAA 
AAAATTTATACTGAGATTCAACGTTTATTGGAAACGACTCATTTTAGAGAT 

40 ATTACTATAGATCAAATTTCTGAGAATACTGGTATTTCTAAAGCAACTATT 
TATAGACGTTGGAAGGATAAATCTTCAATTATCATGTCCGCGTTTATTGAA 
CAATCTCAATATATTGCGATTCATAATCAAGATAATTTATATGATGATTTA 
TTCCAGTTTTTAGTAAAAATAAAAGATATCTATAAAACAAAACTAGGTAGT 
GCTGTGATTGAAATATTAATTAGTCATCAACAAATGGAAGCTAGAGAAACT 

45 TTTATGACTAATTACTTTAATCATAATCGCAAAGTTTTAAAAGAGATTGTT 
CGTAAGCACATACAAGAGGAAGAACAAGATTTGTTTATTGATTTAATCTTC 
TCACCCATCTATTTTAATATATTAATTAAACCTGAAACTCTGGATGAAAAT 
TACATTAAAAAGATGTTGAATCAAGTTTTACGTATATATCATTGATATGAA 
AAACGCTCCTTTCGAGGAGCGTTTTTTGTTTTTCATGTAACAGGTTAGCCT 

50 7VAGGTCAGTCTATTAAAACCTTTAACCATACTATGTCAAAGCATTTTGACA 
ATGATAAAGGTGACATCGCAAAAACTATTTGATAGCTTTTATAGAGAAAAC 
CTGATTTGATTTAATAATAATATTAAATGATTTCAAATCATAATTAAGGAG 
TTTAACGTAATGAATGTTAGCAATCAAATCAAAAAATTTAGAGAAAGAGAT 
GGGTATTCACAAGAATTTTTGGCT6AGAAAATGTTTGTTTCAAGACAAACC 

55 ATTTCAAATTGGGAAAATGATAAAAGTTATCCGGATATTCATAATTTACTT 
ATTATGTGTCAACTCTTCAAGGTATCTCTAGACGAACTAGTGGAGGATGAT 
TTGAAAAACGTGCAAATCAAAAATATTAAA7AAGAATTAGACTTTTGGACA 
TGGATGATGATAATTCCAATGGTATTAGGAAGTGTTCTCATAGGTCCTATG 
ATTCATTACTTTCATTGGTTAGGAATAGGAATAACATATTTAGTATTTTTA 
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ATAGGTATCATTGCTTCTACAAAATTAGAAAAAATTAAAATCAAAAATGAT 
ATTGAAACATACGACAGGATATCTTTAGAGCACAGTGGCGATGATATC 



5 Sequence 3466 

step. 1003ell .cons -ok 

TATAAATTTAAATGTACATTAGCAAATGCATCGTTATCAAGTCCGTCTACA 
GGGTCACCTACTACGACAAGTCCTGTTGCGGTATTATCTGATATTAAATCT 
GATAAAGTAAAATTTGGAAACCAATTTTTATATCTTGCATCTGGAATTTCA 

1 0 ATGCCTGGCGCGATACGGGTATGATATCTAATTGTTTCTAAATC AGCATC A 
TAAGGTAAGTCTTCCTGCACTATAAAGATAATTTCTGGTTCTAGTAATGGT 
GAAAATAATTCTGAAAGAGAGACTGAGGCACCATCATTAACAATTTGGTTA 
GATAAGAGTGTTCCATATGCAGGTTCGTTAGTGTTAGCAATTGCTTGCGTT 
GCCTTGCTAGTCATACTAACTTTATACCCTGTAACAGTCGAACGGTCTTTG 

1 5 AAAGTTAATTGGTCAATTAGTTCTTCTTGTACATGATATGCTTCTTCTTCG 
TTTAATTGATAGTTCTCACTAATGAAATCGATGGGTTTTTTATATCTATAA 
GCTTTAAATAAAACTTTAGCAACTTCTTTATTTGTTAGTGTCATATTGTAC 
CCCTCCCCTAAATTATCTGAAAATTTATACTATTATGATAACTTTTAAATG 
TGAAGTGTGCAAATTGATTTAAATATACTTCATTTAGGGTATAAAATAATT 

20 AATAATGTAAATTTTGATAGGAGTGGGTATAGTGTCAGATTATGTTTATGA 
GTTGATGAAACAACATCATTCAGTTAGAAAATTTAAGAATCAACCACTTGG 
TTCTGAAACGGTAGAAAAATTAGTAGAGGCGGGACAGAGTGCTTCTACATC 
CAGTTATCTTCAAACTTATTCTATTATTGGTGTTGAAGATCCAAGCATTAA 
AGCGCGTTTAAAGGAA6TGTCAGGTCAGCCTTATGTTTTAGATAATGGTTA 

25 TTTATTTGTATTTGTTTTAGATTATTATCGTCATCATTTAGTAGATGAAGT 
TGCGGCGTCAAATATGGAGACATCATATGGTTCTGCAGAAGGACTATTAGT 
AGGTACAATAGATGTTGCATTAGTTGCGCAAAACATGGCAGTTGCTGCCGA 
AGATATGGGGTATGGAATTGTTTATTTAGGGTCATTGCGTAATGATGTTGC 
GCGAGTGCGTGAAATTTTAAATTTACCTGATTATACGTTTCCGTTATTTGG 

30 TATGGCAGTAGGTGAACCTTCTGATGAAGAAAATGGGTCACCTAAACCGCG 
CTTGCCATTTAAACATATTTTTCATAAAGACCAGTATGATGCGAATCAGCA 
TCAACAACGTAAAGAATTGGAAGCATACGACCAAGTAGTGAGTGAATATTA 
TAAAGAACGTACTCACGGTGTGCGTACAGAAAATTGGTCACAACAAATAGA 
AACATTTCTAGGACGTAAAACACGTTTAGATATGTTAGATGAATTGAAAAA 

35 AGCAGGATTTATTCAAAGATAAAAAGAACCTGAGACAGAAATACATGTCTC 
AGGCTAGGGTGGGGCGATATTTTCAACACGAATGTTGCCCCGCTCTTTTTA 
TAATTGTATGTCGTAATGATCAATGACCACTTTTCTTACATTTAATTTGCA 
GATAAATCGCCGTAATCATTTGAGTCAAATGTTTTTTTGTCCCAATGATTC 
GTTAAGCGTGCGGTACCTGTTCCAGCTAGCATTGAGTCATTAACGTTAAGT 

40 GCTGTACGACCCATATCGATGAGAGGTTCGATAGATATCAGTACCCCTGCG 
AGAGCAACTGGTAGATTAAGTGTAGATAATACGAGTATTGATGCGAATGTT 
GCCCCGCCACCTACACCTGCAACGCCAAATGAACTTATAATAACAACAGCA 
ATAAGTGTAACAACAAATTGGAAGTCAATTTCTACATTTGCTACTGGTGCC 
ACCATAACTGCTAGCATAGCAGGATAGATTCCTGCACAGCCATTTTGCCCT 

45 ATGGATAAACCAAAAGTTGCAGAGAAGTTTGCAATTCCCTCAGGTACACCT 
AAACGTTTTGTTTGCGTTTGAACATTTAACGGTAATGCACCTGCACTTGAA 
CGTGAAGTAAATGCAAAGATTAGTACTTCTATTGTCTTTTTCACGTATTTA 
ACGGGATTGATACCTAAGACACTCAGTATAATTAAATGGATAATATACATT 
GTGATTAGAGCTGCGTATGAAGCAATTAAGAATTTACCTAACGTCCAAATT 

50 GCAGAAAAATCACTTGTCGCAAGAGTAGAAGCCATAATAGCTAAAATGCCA 
TAAGGCGTTAATCGTAAAACAAAAGTTACGATAGCCATAACGATAGAATAG 
ATTGCTTCTATACCACGTTTAAGTAAGCTTCCATGTTCCGGCTGTTTTCTT 
GCAACTCTAAGATAAGCAAAGCCCACAAACGTTGCAAAAATAACAACTGCA 
ATTGTCGAAGTTGTACGTTGTCCTGTGAAATCTAAAAATGGATTGCTTGGG 

55 AATACTTCGAGAATTTGTTGTGGTAAAGTGTTTGCAGTTAAATCTTTGGCT 
TGTTTTGAAATTTCTGTACCACGTGAATGTTCTGCACTACCTAAATCAATA 
GACGATGCATCTAAACCAAAGATCAAAGCGTAAAAAATTCCAACGATAGCT 
GCAATGGCTACAGTACCAATTAAAAACATAAAAATATAAGAACCGATCTTA 
GCGAATTTTTCACCAATTTGTATTTTGCTAAAAGCGGCAACAATTGAAATG 
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AATATTAGTGGCATGACAATCATTTGTAATAGTGCAACATAACCATCTCCA 
ACAATACTAAACCAGTCTGTTGATTGTTCGAGAGTTTTAGACTCTGCACCA 
TATATGAGGTGTAATACAATGCCAAATACGATACCTATTGCTAAAGCGGTA 
AAGACACGTTTAGGGAATGAGACGTGTTTTCTTGCCATAATATTTAACATT 
5 ATTAGGAAAACAAGT/yVCACGATGACATTAATTACTGTAAAAAGTATTTCC 
ATATCGATA 



Sequence 3467 

10 step . 1003 f 01 . cons . ok 

TATAAGATAATATAAACTAAGCCT6AGACACCGATGAGTAATCTCAGGCTT 
TTTTACGTATTGAAAGTGATTGAGTAATTAAATAGGGTGGATATCAAAAAT 
GATGTTGGTCGATATGTTAGGTTCTTTAAGAGAATGCAATGCCAGGCTCAA 
TATCGACGTAGTTATGAAGCGGTGTGTATTTTAAATGGCCGTTAGTATCAT 

1 5 GTATAAACGG ACTATTGGAACTC AACTTTTAAAATTGATATAGTATAATAT 
AAAGGAAATTTAGAGGACGTGACGTTTCGATATTTGAATAACTGTATTAAG 
GT^TATCGAAACTTTAAAATATCTACGCAATTTAGGATAAAAAATCAACAT 
TGAGTGAAAGCCAACTTAGTCGAAGCATCTCAATAAAGCGTAAGACACTCG 
AGTTCCCCCACACCCCTATAAACAAAATAACTATGGTACATATGAGAAAGG 

20 ATGAGTCGCTTGATACGTATGAGTGTATTAGCAAGTGGTAGTACAGGTAAC 
GCCACTTACGTAGAAAGTGAGAAAGGTAGCCTACTCGTCGATGTAGGTTTA 
ACTGGCAAGAAGATGGAAGAACTTTTCAGCCAAATCGACAGAAACATTAAG 
GATTTAAACGGAATTTTAGTGACACATGAACACATCGACCATATTAAAGGT 
CTTGGTGTTTTAGCACGTAAATATAAACTTCCGATTTACGCGAATGAGAAT 

25 ACATGGAAAGCGATAGAGAAGAAAGATAGCCGCATTCCAATGGATCAGAAA 
TTTATCTTTAATCCATATCAAACGAAATCTCTTGCAGGATTTGATATAGAA 
TCATTTAACGTGTCACATGACGCGATTGATCCACAATTCTACATCTTCCAC 
AATAACTATAAGAAATTTACGATGATAACTGACACTGGTTACGTTTCAGAT 
CGTATGAAAGGTATGATTCAAGGTAGTGATGTCTTTATGTTTGAAAGTAAT 

30 CACGATGTCGATATGTTACGCATGTGTCGCTATCCATGGAAGACGAAACAA 
CGTATTTTAAGTGATATGGGTCACGTATCCAATGAAGACGCGGGTCTTGCG 
ATGAGTGATGTCATTACAGGTAATACGAAACGTATATACCTCTCTCATTTG 
TCACAAGACAATAATATGAAAGACCTCGCACGCATGAGTGTTGGACAAGTG 
CTCAACGAACACGATATCGATACAGAGAAAGAAGTATTGCTTTGCGATACC 

35 GATAAAGCACAAGCCACACCGATTTATACACTATAATTGAATAGAGGTAAG 
ACAAACGACCTACTTAAGTATTACGCTTAAGTAGGTTATTTTTTTGGAAAA 
GTGATCAGGAGAATTGCAGGGAATGATTATGTGTTAGTTTAAACAGGCTTG 
CGCATGAACGATTGATTAAGGAATTTTTTTGAAATGAAAGCGCATCATGGA 
AGGGGGAGAACGATGTCTTACACGCTTTGTGATTTATATATAATAACATTC 

40 TTTTACTTTTA7VACTGCTTAAATACCGTTCCTTCTGCGACATTTGCACGCT 
GCGCAATTTCTTTAGTACTCGTTTTATCAAATCCTTGTTCACTGAATAATG 
CAATAGCACTTTCTATGACACGTCTTTGTTTATCGCTTAAATATTCAAGTT 
GAGGCACAATGGTTTCAACTAATGACTTAATATCTTGGTTCATATTGACCT 
CCTAAACTTTTCTATATCTTTTCATGCCTATAATATTTAAAATCAATAAGA 

45 AAATGAAAAATGCGAATAACACAATTAAATAAATGTAAATATCGTTGAATC 
CATAACCCTTGATCATAATATTTTGCATCGTTTGGCCGGTATAGAATAACG 
GCATGATATGTGAAAAGTATTGTAATCCTTTATTCATTGATTCAATTGGTA 
TAATGCCTGCAAATAGTACTTGTGGCACTATGACTAATGGTATAAATTGAA 
TCATTTGGAATTCTGAGGAAGCAAAGGTAGATAATAATATACCGAATGTCA 

50 CAGCGACAAGCGCTGTTAATATTGCCGTTAATAGTACGAACCATATCGAAC 
CTACTAAGTCTATATGCAGAATATAAATTGCATATAATACGACAACTATTG 
TTTGGATAACGCTAAAACTACCATAACCGAAAACATAACCAAAAATAATTT 
CACTTCTTTTTATTGGAGAGGCAAGTAAACGTTCTAATGTGCCAGAAGTAC 
GCTCTTTTAATAAGCCAATGCCAGAAATTAAAAACGTAAAGAAAAAGACAA 

55 AAAATCCAATTAAAATAGGATTTATCATATCAAAATACGTAGAATCTGATG 
AACCATATAAATAGTGCGTTGTTAGTTTATATGGTTTAGCCATATCTTGAT 
GAGGCGTGTCTCCCCCATCCCCGGGCATTTTTTGTTGTATTTTAGTTAACG 
CTTGATGCAATTTAT.TAGTATTATCTTTCATGGCATTCATGTTATGACTCA 
TTAACCATTTTTGATTTGCACCTGTTAGTTCTCCTGCTTGTGTAGGATTAT 
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CX3TTAGCATAAGTCACTGATACTTTTTGACCATCACTGTGCAAAAATCCTG 
TTAATTTGTCGTCTTTAATTTTATCACTTATATCATTGTCATTTTTATAAT 
GTTTAACGTGAATATCTTTATCATGTAATTCAGTCATTAATGAATCTGGTA 
CATCGTGAACCCCAACTGTTACGCCATTCGTATTATCX3GCAACTGTAAAAA 
5 TGTAATATAGTAGCGACAATAGTAATATAGGTGCAATAAGTAACAGTGCCA 
ATGTTCTTACATCTCGAATCGTCTGTTTGAAAATACGTCCTGCTATGTGCA 
ACGAATTCATTTTATCCCTCCATATTCAGAAATACGCCTTCAATCGTATCT 
GTGTGATATTGTTTTTTCACTTCATCTGGTGTACCCGTTGCCAATATCTTT 
CCTTGATTCATTAATACGAGCTTATCACAGCGTGTAGCTTCATCTAATACA 
1 0 TGCGTTGTCACTATAATGCATTTATCTTCAGCCTTTGCTTTAGTTAAATCC 
TTCCAAATCTTTAGAGCACAGTGGCGATGATAT 



Sequence 3468 

15 step .1003 f 02 . cons . ok 

TACAGAAAATAAGACCTCTAAAATGCTAACTAGTTAGTATTTTAGAAGGCT 
TTTTTATAAAAAAGAGAAACATCGAAGGGAAGATTCATTCGACATTTCTCC 
ATAATGATATTAATTAAGCTTTTATGCTTAGTTGTATTGTTAATTATTTAC 
CTTCTCCGTAGATGTCATTCCATTGCTCTCTTTTATCTAAAAATGCTTGAG 

20 CAATTGTTTTAGCACCTTCTAAAGAATGACTTGCAGCCCAACCACATTGAA 
CCTCATTACAAGCTGGGACTTCGCTAGCATTTAACACATCATGCAATGTTT 
GATCGATAATATTTAATACGTCATCGTAGTCGTCATGATTAATAAATGAAA 
CATAGAATCCAGTTTGACAACCCATAGGACTTAAATCTACTACTTTATCAG 
TATGATTTCTAATGTTTTCTGCCATTAAATGCTCTAAGGAATGTAGACCAG 

25 GCATATCCATATGTTCCTTATTGGGCTGTTTGAAACGAATGTCATATTTGT 
GTATGACATCACCATTAAGACCTTCCATAGTCCCGGCTAGACGAATAAAAG 
GTGCAACAACCTTAGTATGGTCTAAATTAAAGCTTTCTACATTCATTTTAG 
TCATGCATTTATCCTCCTTATTGCTCTACAGCAATTTTTTGTTTAATTCAA 
TAAATAGATTGTAACAATACTCTAATACATTTACAATTTTTCTAAACTAAT 

30 AGACTGTTGTCTTTTAAATTGTAAGAAAATTGTGTAAAAATTAATGACAAC 
CAATATTCAATCCGATAAAATAAGTAATAATTAAAAAAGAATGATTGATAC 
GCAAAGTTTCAAAGGGAGTTAGATACAATGCATAATAAACAAAAGATATTA 
GATTTTATAGAAAATAATAAATATGATTATGTTGA/^TAAGTCATCGTATT 
CATGAACGCCCTGAATTAGGCAATGAAGAAATTTTTGCATCGAGAACATTA 

35 ATTGACCAATTAAGAGCAAATCGATTCGAAATCGAAACGGATATTGCAGGA 
CATGCAACAGGATTTATAGCAACGTATGATTCTGATATGACTGGACCGGTT 
ATAGGATTTCTAGCTGAATATGATGCTTTACCTGGTCTTGGTCACGCATGC 
GGGCATAATATTATTGGTACTGCTAGCGTACTTGCTGCAGTAGCACTAAAA 
GAAGTCGTCGATGAAATTGGTGGTAAAGTAGTCGTTTTGGGATGTCCTGCT 

40 GAAGAAGGTGGGGAAAATGGCTCCGCAAAAGCTTCTTATGTTAAAGCAGGT 
GTCATTGATGAAATTGATGTAGCATTGATGATTCATCCTGGAAATGAAACT 
TATCGTACAATTAATACTTTAGCTGTGGATGTTCTAGATATTAAATTCTAT 
GGACGTAGTGCGCATGCATCTGAAAATGCAGATGAAGCATTAAACGCTTTA 
GATGCAATGATTTCATATATTAATGGTATAGCACAGTTAAGGCAACACATT 

45 AAAAAAGGACAACGAGTTCACGGGGTTATTTTAGACGGTGGTAAAGCGGCT 
AATATTATACCTGATTTTACACATGCGAGATTTTACACTCGAGCTACTTCG 
CG6AGAGAACTTGATGTTTTAACTGAAAAAGTAAACCAAATTGCAAGAGGT 
GCTGCTATTCAAACTGGGTGTGATTTTGAATTTGGTCCTATCCAGAATGGT 
GTAAACGAATTTATCAAAGCACCTAAACTTGATGATTTATTTGAAAAATAT 

50 GCAACTGAATTAGGAGAAGAAGTGATTGATGATGATTTTGGCTATGGATCT 
ACAGATACAGGTAATGTAAGTCATGTTGTACCAACAATACATCCACATATT 
AAAATTGGTTCTCGAAATCTTGTTAGGACATACCCACCGCTTTAGAGAAGC 
GGCTGCAAGTTTACAAGGTGATCAGGCACTCATTCGAGGTGCAAAAATTTT 
AGCATTAATGGGACTAGAATTAATCGAAAATAAACCGTTGTTAGACGAAAT 

55 AATTGAAGAGCATACGCATATAAAGGGGCATGTTAAGTAATGAATAATAAA 
GTAAATGGTCCGTTTCTCACACTTAGTGATTTGTATAATGATGACATTGTT 
TACACATCTCGACCTTCATATGTGTCGAACCCATGGTTGAAACCAGATGAA 
CACCAATCAAATTTCTTAACTGGAAGAGAATTACTTATAGCTAATCGTTTT 
CCTGTCATTGTTCATGAGGCAAGTGTTACAAACAAATTAGAACAACTTTTT 
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AATATAGTGGGTAGGGAAATTCCACCACATACTTATAAATTTAAAGATCAA 
GAAACATATGAAAGCTTGATAAGGAATTTAGCTCTTCATCAAGGTAAAAAA 
ATCTACTTTCAATATATTCATGATGAAGATATTTTACCTAAAGAATATTAT 
GCACTTGATAAAGATGTTTTTGTTGCTCTTAATAATAAAGCACGAATTCCA 
5 GAATGGACTAATAACAAATATCTACCACAAAGAGAAATTGTCTCAATTAAA 
GATTTTGAATCTCATATTCAAGCATGGTCGTATCCATTTGTCATAAAACCA 
GGCGATGATTTACCTACAGCAGGAGGATATGGTGTTATGATTTGTTATAAT 
GATACAGATTTAGCTAAAGCTATCACACGCATCAACAATGCATCAGCAGAG 
ACTGAAAATTTAATCATTGAACAAAAAATTAATGCAGTGAATAACTATTGT 

1 0 GTAC AATTTGCTTATTC AG ATGATATTGGTATC AAATACTTAGGAACAGCG 
CAACAGTTAACTAATGACTATGGATTTTACAACGGAAATGAAAATGTTAAT 
GATGTGCCTCAGAATGTAATAGACGCTGGTAGAGAGATTATGGAAATAGGC 
GTAAGCAAAGGTTTTTTTGGTGTAGCAGGTTTTGACTTACTAGTAGATGAT 
ilATAATGATGTTTATGCGATTGATTTAAACTTTAGGCAAAACGGATCAACG 

1 5 AGTATGCTACTTTTAGCAAAAGATTTAACTCATGGATATC ATAAATTTTAC 
AGTTACTTTTCTAATGGAGATAATACAAAATTCTATAATGCTATTTTAAAA 
TACGTAGAATTAGGTGTACTTTATCCACTTTCCTATTACGATGGAGATTGG 
TATGGAAAGAATCAAGTTAATTCTAGATTTGGCTGCATTTGGCATGGGGAA 
AATAAAGAATTAATTAATCGATATGAACAACAATTTATATTGGAAGCTGGA 

20 TTATA 



Sequence 3469 
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25 ATGAAGCAGCCGCTGGATCAAGATTTTCGGGAACTTTAACTGCATAATTCG 
CATCAACAATGACTTCTTCAGCCATTGCCCCATCAACAGTATATCCGGCAT 
TAAGGACATCCCTGCATAATGTTTCTCTACCAGTTGTGCAATACTCACAAT 
TACCACATGCTGCATACATCCATGCTATAGATACACGATCCCCTACCTTTA 
ATGAATCTACGTTATCAGCAATTTTAATGACACGTCCAATTCCTTCATGTC 

30 CTAATGTTACACCTGTTACGTCACCGAAATCTGCATTTTTAACATGTAAAT 
CAGTATGACATACACCACAATATTCAGTCTTTACTAATGCTTCTCCAGGTC 
TTAATTCTCTTAAACGTTTTTCTTCAATACTTACTCTATGATCTTTGGTTA 
CTACAGCTGCTTTCATAAAAAGCCCTCCCTAGTATTAAAGTGTAATTCTAT 
TACTTCCACCCTTATTTTACACTCTTTTGAAATCGTTTTCCAAAAGTCCTT 

35 TCATTTATACTGAATTTTTTGTAAATTGTATATAGAGCGACATAATATCGT 
CACTTTTATAGCATATTTTATGATTTATTCATATATTTTATTTTTTAACTA 
TATATATATATCATGTCAGTGAAAAGTCATCGCTCTTAATCACACCTTATA 
ATTTTGGGGATGAAGGTGTTTATTTTAAAAATCTTAATTGGAGGGGATAAA 
AGTGAAGACGCAAGTTTTGTACGGAAAATCTGAAATTGAAATCAATGTGCC 

40 TGATGATAGCACAATCATAGAGCCTCAAAACATCGACGCTATTCAAGATTA 
TGAATCAACAATTAAAAATGTATTAAGAAACCCCACAAATTCTAAACCTTT 
AAAAGAAATGGTAAACAGTAATGATATTGTTTCTATTGTTATTAGTGATAT 
TACACGTCCAACGCCCAACCATATTCTTGTACCTTTACTAATTGAGGAATT 
AAATCATGTTCCTCGTGAGAATTTCGTAATTATTAATGGTACAGGGACTCA 

45 TCGAGATCAAACGCGAGATGAATTGATTCAAATGTTAGGTGAAGATATTGT 
AAATTCAGTAAAAATCGTTCACAATCATTGCTCAGAAAAAGAAAGTCTAGC 
TAAAGTGGGACACAGTCAATATGGATGTGATGTTTATTTAAACAAAGCATA 
TGTAGAATCCGATTTTAAAATTGTAACAGGTTTTATTGAACCACACTTTTT 
CGCCGGATTTTCAGGTGGACCTAAAGGGATAATGCCTGGAATTGCAGGTTT 

50 AGAAACAATTCAAACATTTCATAATGCAAAAATGATTGGCGATCCGAGATC 
AACGTGGGGAAATTTAGAAGACAATCCAGTTCAAGATATGGCACGGGAAGT 
TAACCGTATGTGTAAACCTGACTTTTTACTTAATGTTGCATTGAATAAAAG 
TAAAGAAATTACTGCAGCATTTGCTGGTGAAATCTTAGATACACACAAAGA 
AGGATGCGCATATGTAAAAGATCATGCAATGTTTAAATGTGAGCAACGCTT 

55 TGATATTGTTATCGCATCAAATTCTGGCTATCCTTTAGATCAAAATTTATA 
TCAAACAGTTAAAGGGATGAGTGCAGCGAGTAAAGTTGTTAAAAAAGACGG 
TCATATTATTATGGTATCTGAGTGTGCAGATGGCTTTCCTGATCATGGTAA 
GTTTGCCGAAATTTTCAAT^TGGCAGACACACCTCAAGGTATTTTAGAACT 
TATTCACAATCCAAACTTTAAGGAAGTTGACCAATGGCAAGTACAAAAACA 
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AGCAAGTATTCAAACTTTTGCCAATGTGCATGTTTATTCAGAACTTACTGA 
CCAACAACTTAAAGACTCGATGTTAATCCCAACCTCTAACATTGAACATAC 
T^TACAAGAATTAGAACATCGATATGGCCGTAAATTAACCATTGGTGTTAT 
GCCACAAGGTCCTTTAACAATACCGTACGTAGAAGATAAAGAATAATAAAT 

5 GGAGGATGGAAACTATGACTGAATTAAATACTAAAGAAATGAAATTAGACG 
CACTATTGAAAGACATGCAGAGTGTAGTAATTGCCTTCTCAGGTGGAGTAG 
ATAGTAGCTTGTTACTGAAAAAAGCGATTGATATTTTAGGTGTTAACTATG 
TTAAACCTGTTGTAGTAAAATCAGAATTATTTAGAAATGAAGAGTTTGAAC 
TAGCGCTTAAACTTGGACAAAGTCTAGGTGTTGAAGTATTAGAAACTGAAA 

1 0 TGTCTGAACTTC AAGATGCGAATATCGTTAAAAATACGCCTGAAAGTTGGT 
ACTATAGCAAGCGCTTGATGTATAGTCAACTTGAGAATATTAAGAATAAAC 
TAGGATTTAATTATGTGCTAGATGGTATGATTATGGATGACTTAGATGATT 
TTCGTCCCGGATTiVAAAGCAAGAGACGACTTTGGTGTTCGTAGCGTTTTAC 
AAGAAGCAAAACTATCTTTAGAGCACAGTGGCGATGATATC 

15 

Sequence 3470 

step. 1003 f 04 .cons .ok 

AGGTGAAGATGTATTCTTAGGATTTGACTGTGCTTCTTCTGAATTCTATGA 

20 AAATGGTGTTTATGATTACACTAAATTCGAAGGTGAACACGGTGCTAAACG 
TAGTGCAGCAGAGCAAGTTGACTACTTAGAAGAATTAATTGGTAAATATCC 
AATCATCACTATTGAAGATGGTATGGATGAAAACGATTGGGAAGGTTGGAA 
ACAATTAACTGATCGTATCGGTGATAAAGTTCAATTAGTTGGTGATGATTT 
ATTCGTAACTAACACTGAAATTTTATCTAAAGGTATCGAACAAGGTATTGG 

25 TAACTCAATCTTAATCAAAGTAAACCAAATCGGTACATTAACTGAAACATT 
CGATGCTATTGAAATGGCTCAAAAAGCTGGATATACTGCGGTTGTATCTCA 
CCGTTCTGGTGAAACTGAAGATACTACAATTGCTGATATCGCAGTTGCTAC 
AAATGCAGGCCAAATTAAAACAGGTTCATTATCTAGAACTGACCGTATTGC 
TAAATACAATCAATTATTACGTATTGAAGATGAATTATACGAAACAGCTAA 

30 ATTTGAAGGAATTAAATCTTTCTACAATTTAGATAAATAATATCTCTTTAA 
TTTTTATAAGAGTTCACGCTTATTATTTAATATCTTAATTGAATTGTCCGT 
TGCAGAGTTGAAGCTGCAACGGGCTTTTTTAGTTAAGTACATTTTAAATTA 
TTAATGTAAACTGATATAATGATGTGAGTAATTTTGTAAAAGGAGTACAAT 
TATGACAGACTCAAATGCTAAAGAAATAAGAACTGGACGTTTAATTGCGAT 

35 AAGTTCATTAGTGTTTTGTATTTTACTTATCATACACCACTTTATTGTATT 
AGATGAATCAACAGCTAAATCAATTTTATCTTTAGCTGGTCAAAAAACATC 
AGATACAGCAGTGAAAAACATTTTAAATAGTGACCGATACACTGGAATTAT 
GTATATTTTAGCTTACTTAGCAGGTACTGTTGCTTTCTGGAATCGCCATCC 
ATATTTATGGTGGTTTATGTTTGCCGTATATATTTCTAATGCACTATTTAC 

40 ACTCGTAAATCTTTACTTATTTATTCAAGGTATTTTAGATGTAAAAAATGT 
ACTTGCAGTTTTACCAATTTTAATTGTAGTGATTGGATCTATAATTCTAGC 
AATTTATATGCTAGTTGTTTCTATTACACGTAAAAGTACTTTCAATAGATA 
GAAGTTTGATTACATCAATAAAGATGTGTTATAATTTCTTTCGTATATAAT 
GAACGGAGGACAATTTTTATGCATACATTAATCATCGTTTTATTAATTATA 

45 GATTGTATTGCATTAGTGACTGTTGTATTACTCCAAGAAGGTAAAAGTAAT . 
GGACTTTCAGGTGCTATTAGTGGTGGCGCTGAACAATTGTTTGGTAAACAA 
AAACAACGTGGCGTCGATTTATTCTTGCATAGATTAACAATAATTTTAGCT 
ATTTTATTCTTTGTACTTATGTTTTGTATTAGTTACTTAGGTATGTAGAGA 
ATATTATATTTGTTTTAATGAAGTAAATGAATCTTAACCAAGACTGATAAA 

50 TGATAGGGCAAATTAATCACATTTGCTATTAAGGTGATGCTGCCAACTTAC 
TGATTTAGTGTATGATGGTGTTTTTGAGGTGCTCCAGTGGCTTCTGTTTCT 
ATCAGCTGTCCCTCCTGTTCAGCTACTGACGGGGTGGTGCGTAACGGCAAA 
AGCACCGCCGGACATCAGCGCTATCTCTGCTCTCACTGCCGTAAAACATGG 
CAACTGCAGTTCACTTACACCGCTTCTCAACCCGGTACGCACCAGAAAATC 

55 ATTGATATGGCCATGAATGGCGTTGGATGCCGGGCAACAGCCCGCATTATG 
GGCGTTGGCCTCAACACGATTTTACGTCACTTAAAAAACTCAGGCCGCAGT 
CGGTAACCTCGCGCATACAGCCGGGCAGTGACGTCATCGTCTGCGCGGAAA 
TGGACGAACAGTGGGGCTATGTCGGGGCTAAATCGCGCCAGCGCTGGCTGT 
TTTACGCGTATGACAGTCTCCGGAAGACGGTTGTTGCGCACGTATTCGGTG 
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AACGCACTATGGCGACGCTGGGGCGTCTTATGAGCCTGCTGTCACCCTTTG 
ACGTGGTGATATGGATGACGGATGGCTGGCCGCTGTATGAATCCCGCCTGA 
AGGGAAAGCTGCACGTAATCAGCAAGCGATATACGCAGCGAATTGAGCGGC 
ATAACCTGAATCTGAGGCAGCACCTGGCACGGCTGGGACGGAAGTCGCTGT 
5 CGTTCTCAAAATCGGTGGAGCTGCATGACAAAGTCATCGGGCATTATCTGA 
ACATAAAACACTATCAATAAGTTGGAGTCATTACCTGCTATTAAGTATTAT 
CAGCCTGAGTCCGACGATGTACTTGTCGGACTTTTTTATTTTATAATTGAG 
ACAATCATGTTTTTAAATCATATGTTAAAGGGAATAATGATTTATCAAGTC 
AATTTGATAATAATGAAGAGATTTTTAAATATATGAAATAAATTGAAATAC 

1 0 TTAACTTATTTAGGATAAATTGAGTACTTACTATAAGGGAGAGGG ATTTTA 
ATCATGCAAATTAAACTACCAAAACCATTCTTTTTTGAAGAAGGGAAACGT 
GCAGTGTTACTTCTTCACGGCTTTACAGGTAACTCTGCTGATGTAAGACAA 
CTTGGGCGTTATCTTCAAAAAAAGGGCTATACATCTTATGCTCCACAATAT 
GAAGGACATGCAGCGCCCCCAGAAGAAATATTAAAATCTAGCCCTTTTGTT 

1 5 TGGTTTAAAG ATGTTTTAG ATGGTTATGATTATTTAGTAG ATCAAGGTTAC 
GAAGAAATAGCAGTAGCTGGTTTATCATTAGGTGGCGCCTTCGCATTAAAA 
CTAAGTTTAAATCGTGATGTGAAGGGGATTATAACTATGTGTGCACCTATG 
GAGAATAAAACAGAAGGTTCGATTTATGAAGGCTTTCTT6AATATGCACGT 
AACTTTAAAAAATATGAAGGCAAAGATCAACAAACGATTGATCAAGAAATG 

20 GAACAATTTCATCCAACTGAAACCCTGAGAGAACTGAGTGACACTCTAAAT 
GGAGTTAAAGAACATGTCGATGAAGTAATTGATCCAATACTTGTCGTACAA 
GCAGAACAAGATACAATGATTGATCCTCAATCAGCAAATTATATATATAAT 
CATGTCGATTCTGATGAAAAAGAAATCAAATGGTATCAACATTCAGGTCAT 
GTGATTACCATTGATAAAGAAAAAGAGAAAGTCTTTGAAGATGTATATCAA 

25 TTTTTAGAATCATTGGAATGGACAGAGTAAAAAATAAAGATGTAGGAGAGA 
AAGGAGGGGCATAATGAATTTAAAGCAATCCATCGAAGAAATGATAAAACA 
ACCTGACTATGAACCCATGTCAGTATCTGACTTTCAAGATGCGTTAGGTTT 
AAACAGTGCCGACTCATTTAGAGATTTAATTAAAATACTCGTTGAATTAGA 
ACAGTCTGGTTTAATTGAACGTACAAGAACAGACAGATATCAACGTAAACA 

30 ATCCAACAAAACAAATTCAAAACTAATCAAAGGAACGTTAAGTCAAAATAA 
AAAAGGCTTTGCTTTCTTAAGACCTGAAGATGACGAGATGGATGATATTTT 
TATTCCACCAACTAAAATCAATAGAGCATTAGATGGAGATACTGTCATCGT 
GGAAATTCAAAAATCTCGTGGAGAACATAAAGGTAAAATTGAAGGTGAAGT 
AAAATCTATTGAAAAGCATTCAGTTACACAAGTTGTTGGAACGTATAGCGA 

35 AGCAAAGCATTTTGGTTTCGTATTACCGGATGACAAACGTATTATGCAAGA 
TATCTTTATACCTAAAGGACAAAATTTAGGTGCTGTAGATGGTCATAAAGT 
ATTAGTACAAATTACGAAGTATGCCGATAGTACTGACAAT 



40 Sequence 3471 

step,1003f 07 .cons .ok 

TAAGCAATAAAGTTCTTCAATTTTATGTAAAATATGGAGGAGAATTCCAAT 
TAAAACAAGGATTTAGTCCAGCGTTCACCGTAGAAAATAAAGATGCTATTG 
ATATAGGTTTTGAACAAACATTCTATGAAATAAATGTCGTTATTGCGGAAA 

45 AAGATTTATGGTATTTTCAAGATGAAAAACTAACTGTAGATGCAATCGATC 
ATGAAGATGAGATCATCTACAAACGAAATTAATATAATTATACATATAAAA 
AAAGAAGCGATAAAAACTACTTCGCTTCTTTTTATAAATGATTAATGCCAA 
ACGTAACCTCAGCATCTTTACCTTCTTTATTTAACTGGTTAACTTTACTAT 
AAGTCTCATGCCACTCAGGAAAATATCTATCCAAGCGTATGGGTTTAAATG 

50 TATCTGCTTTAATACAGATAAGTTCTGTTGAACCAGTAGTAGCTAACTCTC 
CATTTTCATTATAAACCTCATAACAATAAGTTGAACGTAATCTAGAATATT 
TTTCCACCCATGTTTTTACTGTCACCTTTTCAGGATAGTAAATTGATTTTT 
TATATTGCACTTTTAAATCTAC7VACTGGTGAAATAACACCTTGTTCTTCCA 
TAGAGGCATAACTGAAGCCAAGCTTTCTTATATAGTCTGTTCTCGCAACTT 

55 CAAACCATGTTGCGTAGTTACCGT6ATAGATAACCCCCATTTTATC6GTTT 
CTTGATATCTTGCTTCAATTTCAGTCAAACTGTATATCATTGGAATAATCC 
TTCTTTCAACTTTAACTTATTCTTTCAATTAAGGATAACATAAAGCAGTCA 
TGCCAACACAATAGTTAGCATGACTGCTTGTACTCAATAGTGAGTCAAAAT 
CATATTAAATTACTGCGCAAGTTTATTTCTTAGTACCATTTGTAAAATACC 
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ACCATGACGATAATAATCTAATTCTACTAGTGAATCAAAACGTACAATAGC 
TTTGAAATTAATGATTTCACCATTTTCTTTTTTTGCAGTCACATTTACAAG 
ATCATGTGGCTGTACATCTTCATTAATATCTACAGATATTTCTTCTTTTCC 
ATCAAGACCCAGTGCTTCTGCAGATTCTCCTTGTTGGAATTGAAGCGGTAG 
5 TACACCCATCATAACTAAGTTAGAGCGATGAATACGTTCATAGCTTTGTGC 
AATGACAGTTTTAACTCCTAATAAATTGGTACCTTTTGCAGCCCAGTCACG 
AGAAGATCCCATTCCATAGTCATTACCAGCTAAGACAACTAAGCCAGTTCC 
ATCTTCTTTATATTTCATTGCTGCATCATATATAGGCATTATTTCTCCGGT 
AGGCCAATATGTTGTAAATCCGCCTTCAGTACCTGGAGCAAGTTGGTTTTT 

1 0 GATACG AATATTGGCAAATGTACCACGTACCATAACTTCGTGGTTACCGCG 
ACGGGAACCATAAGAGTTAAAGTTGCGAATTGCAACATCATGATCTAATAA 
GTATTTTCCTGCTGGTGTATCTTTACCGATAGCACCTGCTGGAGAAATATG 
GTCTGTTGTAACAGAATCACCAAATTTACCCATAACTCTCAAACTTTTAAG 
TGGTTCAATTTTACCCGGCTCTTTAGATAATCCTTGGAAAAATGTTGGATT 

1 5 TTGAATATATGTTGAATTAGGATCGAAATCATATAATGGTTCATCGGTTAC 
ATCTATTTCATTCCACATTTCATTGTTATGATATACATTTTTATATTCTTC 
TAAGAATAGTTCAGGTGTAACAACTTTATCTACAGTATCAGAAACTTCTTG 
TATTGAAGGCCAAATGTCTTGTAAATATACATCTTTACCATCTTTACCTTT 
ACCAATTGGTTCATTTTGTAAATCAATATCTACTGTACCAGCAAGCGCATA 

20 AGCTACAACAAGTTGTGGTGAGGCTAAATAGTTTGCTTTCACTAATGGATG 
GATTCGCCCCTCAAAATTACGATTACCTGATAAAACTGAAGTTACTAATAA 
ATCTTCATCCGCAACTGCCTTTTCAATTTCAGGTAATAGTGGCCCTGAGTT 
ACCAATACATGTAGTACAACCATAACCAACAAGATTGAAACCTAAATCATC 
TAAATACTGTTGTAATCCAGAATCTCTTAAATATCCTGTAACAACTTTTGA 

25 ACCTGGAGCAAGTGACGTCTTAACAAACTCTGGTACTTTCAATCCTTTTTC 
TACAGCTTTTTTAGCAACTAAACCAGCACCTAACATAACATATGGATTAGA 
AGTGTTAGTACATGAGGTAATAGCAGCAATTGCTATATCTCCTGTTTTCAT 
TGTTGCTTTAGATCCATCATTAAAATTAATTTCTGCTTTTTTATCAAATTC 
ACTTTGATCAAGTCCATGTCCTTGATTACCAGCAGGAGCAGTTACTGATTT 

30 TTCAAATTCTTTTTTCATATCACTTAAGAAAATTAAGTCTTGTGGACGCTT 
TGGACCAGAAAGTGATGCCTCTACTGTAGATAAGTCTAAATCAATAACATC 
TGTATATTCAGGATCCTCTTTTTCTACATCAAAGAACATATGGTTTTGTTG 
TAAATATTCTTTAACCAATTCAATATGTTCTTCGTCACGGCCTGTAAGTTT 
CATATATTTCAATGATTCTTCATCTACTGGGAAGAAACCACACGTTGCACC 

35 ATATTCTGGAGCCATGTTAGCAATTGTAGCTCTGTCTGCTAATGGTAAATG 
TTGAACACCTGGACCGAAGAATTCAACAAATTTTCCAACTACACCTTTTTT 
ACGTAATTCTTCAGTCACACGTAAAGCTAAATCCGTAGCTGTTGAGCCTTG 
TGGTAAAGAGTGAGTTAAACGCACTCCGATAACTTCAGGAATTGGGAAATA 
TGATGGTTGTCCTAACATACCTGCTTCGGCTTCGATACCACCAACGCCCCA 

40 ACCTAGAACACCAATACCATTAATCATTGTAGTATGTGAATCAGTACCTAC 
TAAAGTGTCAGGAAATGCTGTTTTTTCACCATCAACATCTCTTACATGTAC 
TACATTTGCTAAATACTCTAAGTTTACTTGATGGACAATACCTGTAGCAGG 
AGGTACTGCATTATAGTTATCAAAAGCTTTTGTTGCCCAGTTTAAAAATTG 
ATAACGTTCATAGTTACGTTCAAATTCTAATTTCATATTACGTTCTAATGC 

45 TTCTGGATTAGCGTAACTATCAACTTGAACTGAATGGTCGATAACTAAATC 
CACAGGTACTTCTGGGTTGATTTTATTAATATCTCCACCAACATCATTCAT 
AGCTTTACGTAAAGAAGCCAAATCTACTACTGCTGGCACACCTGTAAAGTC 
TTGTAAAATAACTCTAGAAGGTTTGAATGGAACTTCACCTTCGTTACCTGC 
ATTTCCGAATTTACTTAATGCTTTGATATGATCATCTGTTATAACAAAATC 

50 ATCCTCTTGTCGTAACACAGATTCTAACAATACGCGAATTGAGTATGGTAA 
TTTA 



Sequence 3472 
55 step. 1003 f 08 .cons .ok 

AGCCACTGACCGTATTTACGGCCAGTGGCTTATTAAATATAATATAAAAAA 
CCGAGCGTATAAACGCTCGGATATAATGATTTATTTTAATTGTAAATCAAC 
TTCATTTATTTCTTTTAAAATTTCAGCTGTATCATAGCAATTTGTAGCGAT 
ATTTGCATATTTCTCTGCTAAACGATAAGCATTAATGTATAAAATTAAACT 
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TTCTTTAGCGATATCTTCAGCGTCATCAGATTTGGCTGGATTTGAAGTTAC 
AACTAATTCAGCAAATTTTTCAGGATCAATATTAATAGACATGAATGAATA 
AACAACTCCTATATTTTTATTAGTTTCATATTAACACAATACTTAAATACA 
AAAAAGTAAAGTCTTTTAATTTTCAGAATAATTAATTTGTTAATGAGACTT 
5 TTATTATAGTACGGTATCCATTAAACTTAATTTATGAGAGGATGAACAGAG 
AATGAGAATATTGTTTATAGGTGACATCGTTGGTAAAGTGGGCAGGAAAAT 
GATTACTACTTATTTACCTAAAATTAAACAAACTTATCACCCAACAGTTTC 
TATAGTAAACGCTGAAAATGCCGCACACGGTAAAGGATTAACAGAAAAAAT 
TTACAAACAACTTTTGAGAGAAGGCGTGGATTTCATGACTATGGGTAATCA 

iO TACATATGGTCAAAGAGAAATTTACGATTTTATTGATGATGCTCATCGAAT 
GGTGAGACCTGCAAATTTTCCTGATGAAGCTCCAGGAACAGGTATGAGAAT 
AATAAAAATTAACGATATTAAATTGGCTATTATTAATTTACAAGGCCGTTC 
ATTTATGCAAGACATTGATGATCCATTTAAAAAGGCTGACCAGCTAATCGA 
AGAAGCTCAAAAATCTACACCATATATATTTGTAGATTTTCATGCTGAAAC 

1 5 TACATCTGAAAAAAATGCTATGGGTTGGTATTTAGATGGTAGAGTGAGCGC 
TGTTGTTGGTACTCACACACATATTCAAACTTCTGATGATCGTATATTACC 
TCATGGCACAGGATATATCACAGATGTCGGGATGACAGGTTATTACGATGG 
TATTTTAGGTATCAATAGAGATGAAGTTATTCAACGTTTTATTACTAGTTT 
GCCACAAAGGCATGTTGTTCCAGATGATGGGCGAGGCGTATTATCAGGAGT 

20 TATCATAGATTTAGATAAAGAAGGTAAAACGACTCAAATAAAAAGACTGTT 
AATAAATGAGGACCATCCTTTCCAAATTTAAAGGGATATAGTCCATTAGTA 
TGAATTTAAACTTGTACCACTGTTGAAATAGTAAAACGGTGGTTTATTTTG 
TTATCATAGAGTATGAATTAATACCACAGGAGGCATGTGATATGAAATCAC 
AAATATCATGGAAAGTGGGCGGTCAGCAAGGCGAAGGTATTGAATCTACCG 

25 GTGAAATCTTTGCTACTGCGATGAATAGAAAAGGTTATTTTTTGTATGGAT 
ATAGACACTTTTCTAGTCGTATAAAAGGTGGCCATACTAATAATAAGATAA 
GAGTTTCAAAATCGCCTGTGCATGCGATTAGTGATGATTTGGATATACTCA 
TTGCTTTTGACCAGGAAACGATTGAATTAAATCATCATGAAATGAGAGAAG 
ATAGTATTATAATTGCGGATGCTAAAGCAAAACCCCAAAAGCCAGAGAACT 

30 GTGTGGCTCAATTAATTGAGTTACCATTCACTAGCACGGCAAAGGAACTTG 
GAACAGCATTAATGAAGAATATGGTGGCAATTGGTGCGACATCTGCACTGA 
TGGATTTAAATACATCAACTTTTGAAACTTTAATCGATAACATGTTTTCAA 
AAAAAGGTAATAAAGTCGTTGATATGAATATACAAGCCCTTAATATGGGTT 
ATGATTTAATGAAGCAACAAGTTACCAACX3TTAATGGAGACTTTACATTAG 

35 AGAATGGTAGCGGTCATCCTCATTTATATATGATAGGTAATGACGCAATCG 
GATTAGGAGCAATAGCAGCTGGATCAAGATTTATGTCCGCTTATCCAATTA 
CGCCAGCTTCTGAAATTATGGAATACATGATTGCCAATCTACCTAAAGTTG 
ATGGTACTGTTGTTCAAACTGAAGATGAAATAGCAGCAGCAACGATGGCGA 
TTGGAGCTAACTATGCTGGCGTACGAGGCTTTACAGCGAGTGCGGGTCCAG 

40 GTCTTTCTTTAATGATGGAATCTATTGGATTGTCTGGTATGACTGAAACGC 
CATTAGTCATTATTAATACTCAAAGAGGTGGTCCTTCTACTGGCTTACCAA 
CAAAGCAAGAACAATCAGATTTAATGCTUU^TGATTTATGGTACCCATGGTG 
ATATTCCGAAAATTGTCGTTGCTCCTACAGATGCTGAAGATGCGTTTTATC 
TTACTATGGAAGCATTTAATTTAGCTGAAGAATACCAATGTCCAGTCATTC 

45 TGTTAAGTGATTTACAATTATCATTAGGAAAACAAACTGTTAAAACACTCG 
ATTATAATAAAATCGATATTC6TCGTGGAGAAATAATACAGTCAGATATCG 
AGAGAGCTGAAGATGATAAAGCATACTTTAAAAGATATGCATTAACAGCTA 
GTGGCGTATCACCACGACCAATACCAGGTGTTAAAGGTGGTATACATCATG 
TAACA 

50 

Sequence 3473 

step . 1003 £10 . cons . ok 

ATGGTACGTATGAAATGGCAAATACCCTAGCTGAAAACAATAATCCCTATG 
55 CATATGTTGTCAGTGCATTT^GAAAAACGTTAAAAGCGTGTATCGTGCCTT 
TGAAAGAGGTGGAAAATCATGCTTAATCAATTTGTATGGGTTATTTATCCA 
TATTTATGTTTAGCAATATTTGTTATTGGACATATCGCAAGATATAAATAT 
GATCAATTTTCATGGACAGCAAAGTCGAGTGAAATGATTGAAAAGAAACGA 
TTAAAATGGGGAAGTTTACTTTTTCATTTAGGTATTATTCCGGTATTCTTT 
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GGTCATGTTGTGGGATTATTGATTCCCGCTAATTGGTTAGAGGCGATAGGA 
GTAAATAATCATATCTATCATATA6GTGCAGTTTATATAGGTA6TGTTTTT 
GGTATAATAACATTAATAGGAATGTTGTTGCTAACTTTAAGACGACTATCC 
ATCAAAAACGTTAGACGATTAAGTTCATTTTCAGATATATTTGTGAATATC 

5 GTTTTGTTGATTATTTTAATAATGGGTTGTTATTCTACGCTTGTAACCAAT 
GCGATTCAACCTGAATTTGATTATCGTCAAACCATTGCGATATGGTTTAGA 
CATTTATTCATGTTTTCTCCAAATGCTGACTTAATGTTAAACGTGCCTTGG 
TCGTTTAAACTGCACATATTATTAGGGTTTACAGTGTTTGCGTGTTGGCCA 
TTTACTCGTTTAGTACATGTTTGGAGTGTACCACTGTCTTATATGAACAGA 

1 0 AGATATATTGTTTATCGCAAAAACAAAATTTAATTTATTATGAGGTGAAAG 
TGTGGAACCAGAATCTTTAAAACATAATCAACATTTACAAATGAATTTAGA 
TAAGTTGAGAGCACAAGAGGGTTATGATTTTGGTGGTATCGCTTTATATGA 
TTATCATCACACTTCATCACCAATTAAATGGCAATATGTTTCAGGTAACAC 
AAATGATAGATATAAACTTATCATTTTGAGAAAGGGTAGAGGGCTTGCTGG 

1 5 AATGGTGATGAAAACCGGTAAGCGTATGGTTATTGCTGATGTAGATACAGC 
TTTATCTCCAGAAGAGAAAGTTAAATTTCCAATCATTCTTAGTGAGTCATT 
GACAGCTGTAGTTGCAGTCCCTTTATGGTTAGAAAATTCAATGTATGGCGT 
TTTATTATTAGGTCAAAGAAATCATCA6CCGTTACCTCAGTCATTGGACCA 
ACTTAATATTGAAAAACAAATCGGTATTTTTACAGAAATAAACTAGGTGGT 

20 AAATATGTTGGAGCAAACTGATTTAAGTTTAGAGCAATTACTTAAGAATTA 
TTATGAAACCACGAACGAGAAAATTGTATTTGTTAATAGACAAGGCAAAAT 
TATTGGTATGAATGACGCAGCAAAAGATATTTTAACTGAGGAAGATAATTA 
TAATGCTATGACAAATGCGATTTGTCATCGATGCGAAGGATACTCTAATGA 
ATATGATGTACAATCGTGTAAAGATTGTTTTTTAGAGACAACGCAATTACA 

25 ACATTCCAATTTCCAAGTATTTATGAAGACAAAAGATAATGAAATTAAGCC 
TTTTACAGCTATGTATCAAAATATTGATGAACAAAGAGGTATTAGTGCATT 
TACCTTACAGAATGTGGCGCCTCAGATTGAAAGGCAAGAAAAAATGTATCA 
ACAAAAAATGTTACATCGTTCAATTCAAGCACAAGAAAATGAACGAAAGCG 
TATTTCTAGAGAATTACATGATAGTGTAATACAGGATATGCTCAATATAGA 

30 TGTTGAACTAAGGCTTTTGAAGTATAAGCACAGGGATAAGGTGTTAGCTGA 
AACATCTCAACGTATAGAAGGCTTATTATCACAGCTTATTGATGATATTAG 
AAATATGTCTGTTGAATTAAGACCTTCTTCTCTCGACGATTTAGGCATTGA 
AGCAGCTTTTAAATCATATTTTAAACAGTTTGAAGAAAATTATGGTATGCA 
TATTAAATATGATTCGAACATTAAAGGCATGCGTTTTGATAATGAAATTGA 

35 AACAGTTGTGTATCGTGTAGTTCAAGAGGGTGTATTTAATGCTCTAAAATA 
TGCTGAGGTTAATGAAATTGAGGTAAGTACGCATAGTGATGGCAAGCAGCT 
TGTAGCAGAGGTTGTGGATCGAGGTAAAGGGTTTAGTTTAGATCATCACCC 
TAAAGGCTCTGGACTTGGATTGTACGGAATGAGAGAACGTGCAGAATTAGT 
TAACGGTCATGTTAATATAGAGACACATATTAATAGAGGTACTATAATTAC 

40 ATTAGATATACCGATTTAAGTTGATGATGTAAGTTGGGGGAATTGAAGTGA 
AAATAGTTATAGCGGATGACCATGCAGTTGTTAGGACAGGATTTTCAATGA 
TATTAAATTATCAAGAAGATATGGAAGTTGTTGCAACTGCAGCTGACGGGG 
TTGAAGCTTATCAAAAAGTGTTAGAACATCGACCAGATGTTTTAATTTTAG 
ATTTGAGCATGCCGCCAGGAGAGTCAGGCTTAATCGCAACCAGTAAAATTT 

45 CTGAAAGTTTTCCTGATACTAAAATTTTAATACTTACGATGTTTGATGACG 
AAGAATATTTATTTCATGTGTTAAAAAGTGGTGCTAAAGGATACATTTTAA 
AAAATTCACCTGATGAGCAATTAATATTGGCCGTACGTACAGTATATCAAG 
GTGAAACTTATGTTGATATGAAATTGACGACGTCTTTAGTCAATGAGTTTG 
TCAATCAATCACAAACGGATGAAGTGTCATCATCTTCAGATCCATTTAAAA 

50 TTTTATCGAAACGAGAGTTAGAAATATTACCTCTTATAGCAAAAGGCTATG 
GCAATAAAGATATTGCAGAAAAGTTGTTTGTATCGGTGAAAACGGTAGAGG. 
CACATAAAACGCATATTATGACGAAACTAAATTTAAAGAGTAAACCTGAAT 
TAGTTGAATATGCCTTAAAGAAAAAATTATTAGAATTTTAAATGTATATCA 
TGATTGAITATTAGGGCTTACTTAGTCTATGACTAAGTAGTCCTTTTTTTG 

55 TACCTTTATATAGGGAAAAATAAAAGTAAAATCAGGGTATGTCCTTATGTT 
CATATTTTCACAAACATATTAAAATGAATTTGTGATTGGTACTTTGTTTAT 
AATTTTGTTTTATATTATTAATTTTGACCTCTTAT 
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Sequence 3474 

step . 1003 f 11 . cons . ok 

TTCGTCATTGTTCGACTTTATTTGTTAATAAAATAGATAAGATAGAGGTGG 
AAGAAACCGCCCGCTTGCTACGTCAACTCGAGCGTCTCAATAGCGATGCCA 
5 ATATTCAAGTTGGTCAATTTGGAGAATTAAATTTAAAATCACTGCTAGAGC 
CAACACATATAAATTCAAATGCATGTGGCACTTTGCATAGTAATATAAATC 
ATCAATTCATCGAAAATCCTAGGCTACAAACAAAAGAAGAAATGATTAGTG 
CGTTAGATAACTTGCCTCAAGATGTTTACCGTGTCAAAGGGTTTGTTCGTT 
TTTCAGATCAGCAACACGTTTATTTAGTACAGTATGCACAAGGAAATATAG 

1 0 AATTATCTCCC ATTC AACTTT^AAAACGATGTACCATTGTACCTC ATTGTTA 
TAGGAAAACATTTAAAACAAATACAATTTGATTTATAAAAACACATCTCTC 
AATTATCATCAGTTAAACTTTATACATAAACAAACCTCTAAATGACTTTTG 
TTCATTTAGAGGTTTTTGGTACTTATTTCTATTGTTCGTCTTTTATTAATG 
TTTGCGCTTTTGAACTGATTCCATGCTTACTTCTACTCATTATTTACCTCA 

1 5 CTTGGATTAAATTCC AAACTATCCGCAAAGTAGACGCTTTCTTCTTTATTG 
TCGACAAAAGAAAACTCGATATCTTTATAACCTATAGATGAACCTTTAAGG 
TCACCGTCCCCTTTT/U^TATTAGTTTAGGAGCTCTCTTAATCGGAATATCA 
TACCTCTTTCTGAGTTGCTTCACATTGTCATCTTCATTACTTAATTGGTAC 
TCTGCCGAATAGCTTGGTACGTTTGGGTTATAAGACACATTTCCATTCTTG 

20 TAGTCTTTTAAATTTTTAAAATTCCCATATTGAGAGAAAAACTGAAAGTTT 
TTAATTTCATTTTTTAATTTTTCATCGGCGATAGGTTTAGTCGGTTCAATT 
TTATTGTTTTTAAGGCGAACCGGATATTCTTTATCTTTACTGTGCACTCTC 
CCTTTTTTATCTTCTGTAATTATATTTGTATAAAAATGACCAGTCGTCTTT 
CTAGTATTTCTATTCATATACAAAACCATACCTCTAGATTTCATCGCTTGA 

25 TCTTTTTTCTGiyVTATTCATTTCTGAATTAATAATCCATGTCCCTTTGTCC 
TCTTTTTCAAATTCTTCATCACGATAGCCTTCTTTATCGTATAAATCTTCT 
AAATTTTTAATTGGATACATACTTAATGTTTTATTAAAATTTTCTTTAATT 
TCCGCTTCTTTATTATTTTCTTTATTCATAAATCCACATCCGCCAATAAAA 
AACGTTAATATTAAAATGCTTATGTAAGAAATGCACTTCCCATAATATTTC 

30 ATCTTTTCACATCTCTCTATTATTGTATACATTTATTAAATATACATTTTA 
TTGATATAAATATCAGTTTATAAAAATTCAATTTTATATCATCAAATTATT 
CATACACATTCAATTTAATTTATTTATATGTTTTTTTCGGCGAAACAATCA 
AAAAACCTCATTAAATGTACCGCAATCATACATCTAATGAGGCTATTATCG 
TTCTTTTAATCGCTACTTACGTGTAATACCATCTAACTCTTCGACTTTCAA 

35 CATATTATTTATCATACTCATATAGAATTGACCTAATTCTTCTCCAGTTCG 
CTTATAATCAGTAGTATCTAAGCCGACTAAATCTAGATGATCCCAATCATA 
TTGTACGGGTCTCACTTGCCAAATACCCTTGTCAGTTGCTAGGTTCATCAT 
ACCTACCTTCTTAAATGCTTCATCACTTGGATGTAGTGAAGAAGACACAGG 
TACTATGCCATCATTTACTCTGACATTTTTATTATCATCTCCACCTATCAC 

40 ACGACTTGTTAAATCGAATAGTGGGAATTGTCTAATATTCGGCACTTCATT 
GCCTAATGGTCCAGTATGTGTTGCAGCACCTGTATATGATGTATAGACGAT 
ATTAGGATTCAATGTCGTCATTTGGTTTAACTTTTCTGCTCCAACAGTTGT 
TAAATCATTTACAGCCTGATCTTCAGTCTCCCAAACTTTACTATTCGCTAT 
ACGTTTTGCATATTCAGCATATGATTCATTAGGTTTCTGTTTGAAGCCCCA 

45 TTGAGAAAAACCTAGTTCTAAATCGAGCGCTTTAGTTCCACCAATTTTTCC 
AATTCTATTAATTGTATCTTTGATAAATTTAGTCGACCCTAGTTTATCTGC 
AGCAGGTGTGCCATTATGAGGTGTTCCTAATGTAGTAATCGTAGACACCAT 
GTTATCTTGTCCACCTTTAAACAAATCAGATACCGTACCACCATATTGACG 
TTGATAATCTATTTCTTCTTGATTTCCATTTCTTAAAAAATGTTCCATCAA 

50 GCGTATCGTTTGGCCACCCATACTATGTCCAACAAGATGTATCTTTTTACC 
TGGTTCCCAATCAGGCATGATGCCTTCATATGTTCTGCCATAACGCTTGTG 
ACCATATTTTGCAGCATGTGCTGCACCATAATCTACTCTTCCACCTTTAAT 
ATAATAATACAGTTCAACAGCACGGTCATAATTGCTGCTAAATGCTCCTAC 
ATTGGCTTCGTGAACTCGGTAACCTAATTTTGTAAGTTCTTGTTTCACGTT 

55 ATATTTAGTACCACCCCAATAATTTGGGTACATGCTCAATGAATCTTCACC 
GACTAAACCTACAAATCCATGTACAAACACGACTGGATATTGATTTTTATA 
TTGAGCTTGCGCAGTAAGTTGATTGATATGTTTAGTTTGTTTTGATTTCAC 
AACGCGTGGTGAAGTTAATGATGTTTGCTTTTTTTGATTTGTTTTAACTGT 
TTTATCAGTTTTTGTATTACCTATTTTGTGTACTTTAGAAGATGGTTGTTT 
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TACTGTTTTATCTCTATCTCCTATAGACTGTGATTGTTTAGGTTGTGTTCC 
TTTTGAAGATGGGGCTTGAGTTTGATCAGATGGGTGTTGCTTTTTATCTGA 
TGAATCTTGATCATTATTTTGGTTTTGAqTATTAGGTTGATGTTCTGTTTG 
TGAGTGTTGATTATCTTTAAACTGTTGTTTiScTTTATCATCTTGATTT^ 
5 GCCTTTTGGATATGTATCCTGACGTGAATTTGGGGTAACATGATTGTCTTT 
TGATTGTTGCCCCTTTAAAGTTTGTTCTTGGTCATTTAATGGTGACGTTTG 
TGTTGAATCTTGTTGAGTTTGATTATTCTTTTTTTGTACATTTTCATCAGA 
TTGGTTCTCTTGATTAGTCTTTGATTGGCTAGCTGTGTCACCCTCTTGATT 
CAATTCGTATGATAAATGAGTTGTATCTTCTTTGTGTTCAACTTGTTCTGT 
1 0 TTGTTGTG ACTGTTCGTTAAAATCTG ATG ATGATTGTGTCATTTC AGC AGC 
TTGAGC 



Sequence 3 475 

15 step. 1003 f 12 .cons .ok 

AACGAAATACAAGAAAATGTGACTGAAAAGACGAAAGGTCATAAATTAAAA 
AAAAGTGCAGCTAAAACAACAGCCCTAGTTGGTGGAGCATTTACATTTAAT 
ATGTTGAATAATCATCAAGCATTAGCTGCCTCAGAAACACCAATCACCTCT 
GAAATTTCATCCAATAGCGAGACAGTAGCCAATCAAAATTCAACTACGATT 

20 AAAAACTCACAAAAAGAAACAGTC7VATTCTACAAGTTTGGAATCTAACCAT 
TCTAATAGTACAAATAAGCAAATGTCTTCAGAAGTTACAAATACAGCTCAA 
TCAAGTGAAAAAGCTGGAATTAGTCAACAAAGTAGTGAAACATCAAATCAA 
TCATCTAAGTTAAATACATATGCCTCCACAGACCATGTAGAGAGTACAACT 
ACTAACAATGATAATACTGCACAACAAGATCAAAATAAGTCTTCGAATGTA 

25 ACCTCTAAGTCAACACAATCAAACACGTCATCCTCAGAAAAGAGCATTAGC 
TCCAATTTAACCCAGTCAATCGAAACAAAAGCAACCGATTCATTAGCGACC 
AGTGAAGTACGTACTAGTACAAATCAAATATCTAATCTGACATCAACATCT 
ACTTCAAATCAATCGAGTCCTACTTCTTTTGCAAATTTAAG7UVCATTTAGT 
AGATTTACCGTTTTGAATACGATGGCAGCACCGACAACAACGTCCACGACA 

30 ACAACTTCAAGTCTGACATCTAATTCTGTTGTGGTGAACAAAGATAACTTT 
AATGAACATATGAATCTATCTGGATCTGCGACGTATGATCCAAAAACAGGT 
ATTGCTACCTTAACGCCAGATGCATATAGTCAAAAGGGTGCCATATCTTTA 
AACACTCGATTAGATTCAAACCGTAGCTTCCGTTTTACAGGTAAAGTTAAC 
CTTGGTAATAGATATGAAGGTTATTCTCCTGATGGTGTAGTAGGTGGAGAT 

35 GGCATTGGCTTTGCATTTTCACCAGGTCCTTTAGGACAGATAGGTAAAGAG 
GGGGCTGCCGTTGG7UVTAGGTGGTTTGAATAATGCCTTTGGATTTAAATTG 
GATACGTATCATAACACATCACCTCCTAAATCTGATGCTAAAGCAAAAGCA 
GACCCACGTAATGTTGGTGGTGGTGGCGCTTTTGGTGCCTTCGTAAGTACA 
GATAGAAATGGTATGGCTACCACTGAGGCATCATCTGCGGCTAAATTAAAT 

40 GTACAACCTACTGACAATTCATTCCAAGATTTTGTCATTGACTATAATGGT 
GATACAAAAGTGATGACAGTGACGTACGCTGGACAAACTTTTACGAGAAAT 
CTTACAGATTGGATTAAAAACAGTGGTGGTACGACGTTTTCTCTATCTATG 
ACTGCCTCAACTGGTGGCGCAA7UVAACTTACAACAAGTTCAATTTGGAACA 
TTCGAGTATACAGAATCAGCTGTTGCTAAAGTACGCTATGTAGATGCAAAT 

45 ACTGGTAAGGATATTATTCCACCTAAAACCATTGCAGGTGAAGTTGATGCG 
ACTGTGAATATAGATAAACAATTAAACAACTTGAAAAATTCAGGTTACAGT 
TATGTTAGTACAGACGCTTTACAAAACTCCAATTATTCAGAAACATCAGGT 
ACACCTACACTTAAATTAACTAACTCAAGCCAAACGGTGATTTATAAATTC 
AAAGATGTTCAAGGTCCTCAAATTAGTGTTGATAGTCAAACTAGAGAAGTT 

50 GGAAAGACCATTAATCCAATTACAATTACTACAACTGACAATAGTAAAGAC 
GTATTAACTACAACTGTGACAGGTCTACCTTCAGGGTTATCTTTTGATCAA 
ACGACAAATACAATTACTGGCACGCCAAGTGAAGTAGGAACTACAACTGTG 
ACAGTTAATACTACTGATGCTACTGGGAACGTAACATCTAAGCAATTTACA 
ATAACGATTCAAGATACAATCAGCCCTGTTGTAAATGTGACGCCAAGTCAA 

55 GCATCAGAGGTCAACGACAACAGCTATAAAGAACCATAAAATATTACTTAT 
TCCACCTAGTACAGCGACAATAGGACCTCCATGAGCAACTTTATCACCGAC 
ACCTCCAACCGCCGCTATTACTGAAGCAATCATCGCGCCAATCATATTGGC 
TGGAATGATTCTTAGAGGATCTTGTGCAGCAAAAGGGATTGCCCCTTCAGT 
AATACCAAATAATCCCATCGTAAATGATGCCTTACCCATTTCTTGTTCTGC 
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TTTATTGAATTGGTGTTTACGAACAAATGTAGCTAAACCTAAACCAATCGG 
TGGTGTACATACTGCTACAGCAACCATTCCCATCACAGTGTAGTTGCCTTC 
AGCAATTAATGCAG7VACGGAATAAGAATGCTACTTTGTTTACTGGACCTCC 
CATATCAAAAGCAATCATCGCGCCAATAATAAGAGCAAGAATAATGATGTT 
5 AGCACCTTGCATTCCTTTTAACCATGATGTT/^TGCACCAAATATATTTGA 
AATTGGTGCGCCTATTACAAATATAAAAATGAGACCAACTATTAAAGAAGA 
TAGAATAGGTATAATAATAATAGGCATAATAGGiGCCATAGCTTTAGGAAC 
TTTAATCTGTTTAATCCATTTTGCAATATAGGGfcGCTi^GAAACCTGCGAC 
GATACCACCAAGGAAACCGGCTCCTGCTTCACTTCCATATAAACTACCATC 

1 0 • AGCAGC AATGGCACCACCAATC ATACCTGG AACAAGAdGAGGCTTATCAGC 
AA':*ACTCACCGCGATATAACCAGCAAGGATGGGAACCATGAATTTAAACGA 
TAAACT/iCCAATATTTTCAATAGATTTCCAAAATGAATCTTCGGGGATAAC 
TAATCCTTTTGGAGTGGTGTGTCCTCCAAGAGTTAAGGCAATAGCTATGAG 
TAAACCGCCAACGACGATAAATGGAACCATGAAGGAAi^CACCATTCATTAA 

15 ATGCTGATATACCATTTGAATATTACTTTTATGAGATT 



Sequence 3476 

step. 1003g01 . cons . ok 

20 cgccaaaactgatattgaatgga6atgttgatgtagcttctgtagatgacg 
atcaatattggcagtatccaccttttaaacttaccaaciy^gmgaa 
tatacggtcgtggcgttagcgatatgaaaggtggtatgtettcat.tattct 
acgtcttggagcaattacatcaagaggggcaacgtccagaaggtgatatta 
ttgttcaatcagtagtcggtg/iagt^gtaggtgaagcaggaactaaacgtg 

25 CATGTGAAATAGGACCTAAAGGTGACTTAGCCCTTGTCTTAGATACGAGTG 
AGAATCAAGCACTTGGGCAAGGTGGCGTGATTACCGGATGGATTACAGTTA 
AAAGTAAAAATACAATACATGATGGTGCGCGTAGTCAAACGATACATGCTG 
GTGGGGGCTTGTTTGGTGCAAGTGCCATTGAAAAAATGACAAAGGTGATTC 
AATCGCTTAATGAACTTGAAAGGCATTGGGCTGTCATGAAGAAGAGCCCTG 

30 GAr..TGCCTCCAGGTGCGAATACAATTAACCCAGCTGTCATAGAAGGTGGAC 
GTCACCCVGCATTTATTGCAGATGAATGTCGATTATGGATTACTGTTCATT 
ACTTACCGAACGAAAGTTATGAATCTGTAGTTAATGAAATAGAGCAATATT 
TAAATAAGGTTGCAGAAGCAGATGTATGGCTCAGAGAGAATCCACTTGAAT 
TTGAATGGGGTGGTACATCCATGATTGAGGATAAAGGAGAAATCTTCCCAA 

35 GTXTCACTGTTCCGACACATCATCCAGGTTTTAAGCAATTAGAAGAAGCAC 
ATGAACATATTCATAATAAAAAGCTTGAACATGGTATGAGTACAACTGTAA 
CTGATGGAGGTTGGACAGCACATTTTGGCATTCCCACGATATTATATGGCC 
CAGGTAGTTTAGAAGAGGCACATAGTGTAGATGAGAAAATAAAAGCAAAGG 
AATTAGCTCAATATAGTGATGTTTTATATACATTTTTAAAAGAGTGGTATG 

40 CACACCCACAATCCTATAAATCATCATAGATAAAAAAGAGGTACAAGCACG 
ACCTTTTATACTCACAAAATGTTGAGTTAAAGTGATGTGTTCTGTACCTCT 
TAATTATTATAAAGAATAACAAGTTGTATATTTATAAAAAGATATTTATTA 
ATTTCGTGTTCATTTCATATAAAATACAATTGTCAAGGTGAGTTTATCTTT 
TAAATGTACTTTTT7\ATTTTCGTATCCAACTACCACCACGTCTAGAATGAA 

45 CGGGTTGTCCAGAAGGTTGTACCGATTGTTTTTTTAGTTCTTCTCCATACT 
TAATTGCTTCTTGCATCAAGTAAGATTGAGAGTTTAAAGGAGATGAATTGT 
TTTCGACATAATGATAAACGCCTTGTGCATCTTGGTAACGACCTTTTTGGT 
TGTCAGATAATTGTAAATTCATATAGTTAACTAATCGTTGTCCTATTGATT 
TATCTTCAACAGGGAATAATATTTCGACACGTTTAATCATATTACGTGTCA 

50 TA;:CGTCAGCTGAAGATAAATAAATATGCGCCTCACCATTATTATGGAAGT 
AGTA^iArACGTGAATGTTCAAGCAAACGACCCACAATACTAACAACCTCTA 
TATTTTCGCTAATACCTGGAATGCCTGGTTTAAGACAACATATACCACGAA 
TGATGAGTTGTATTTTAACGCCTGCTTGGGATGCTTCGAAGAGCTTTTCGA 
TAATCGTTTTATCGGTTAAAGAGTTCATTTTCATCAT7\ATTTTACCGTTAC 

55 CATGTTGTAAATGACTACGTATTTCTTTATCGATACGATCAATGAAGACGT 
CTCGAATATCGTATGGTGCTACAATCAATTTATTGTATTCTGGTTTTGTTG 
AGTAACCACTCAAGTAATTAAAGAAGTTAATTGCATCCTCAGCGATATCTT 
TATTTGTCGTGATGATACCCATATCTGTGTATAATTTAGCAGTTTTATCGT 
TATAGTTACCTGTGCCTAAATGAACAAATGACGTAAGTTTATTGTTGATGC 
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